7 datasets found
  1. Data from: A dataset to model Levantine landcover and land-use change...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Kempf; Michael Kempf (2023). A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.10396148
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michael Kempf; Michael Kempf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 16, 2023
    Area covered
    Levant
    Description

    Overview

    This dataset is the repository for the following paper submitted to Data in Brief:

    Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).

    The Data in Brief article contains the supplement information and is the related data paper to:

    Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).

    Description/abstract

    The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.

    Folder structure

    The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:

    “code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.

    “MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.

    “mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).

    “yield_productivity” contains .csv files of yield information for all countries listed above.

    “population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).

    “GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.

    “built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.

    Code structure

    1_MODIS_NDVI_hdf_file_extraction.R


    This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.


    2_MERGE_MODIS_tiles.R


    In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").


    3_CROP_MODIS_merged_tiles.R


    Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS.
    The repository provides the already clipped and merged NDVI datasets.


    4_TREND_analysis_NDVI.R


    Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.
    To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.


    5_BUILT_UP_change_raster.R


    Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.


    6_POPULATION_numbers_plot.R


    For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.


    7_YIELD_plot.R


    In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.


    8_GLDAS_read_extract_trend


    The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection).
    Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.
    From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).
    From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.

  2. f

    Cleaned NHANES 1988-2018

    • figshare.com
    txt
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    figshare
    Authors
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.

  3. H

    National Health and Nutrition Examination Survey (NHANES)

    • dataverse.harvard.edu
    Updated May 30, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Damico (2013). National Health and Nutrition Examination Survey (NHANES) [Dataset]. http://doi.org/10.7910/DVN/IMWQPJ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 30, 2013
    Dataset provided by
    Harvard Dataverse
    Authors
    Anthony Damico
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    analyze the national health and nutrition examination survey (nhanes) with r nhanes is this fascinating survey where doctors and dentists accompany survey interviewers in a little mobile medical center that drives around the country. while the survey folks are interviewing people, the medical professionals administer laboratory tests and conduct a real doctor's examination. the b lood work and medical exam allow researchers like you and me to answer tough questions like, "how many people have diabetes but don't know they have diabetes?" conducting the lab tests and the physical isn't cheap, so a new nhanes data set becomes available once every two years and only includes about twelve thousand respondents. since the number of respondents is so small, analysts often pool multiple years of data together. the replication scripts below give a few different examples of how multiple years of data can be pooled with r. the survey gets conducted by the centers for disease control and prevention (cdc), and generalizes to the united states non-institutional, non-active duty military population. most of the data tables produced by the cdc include only a small number of variables, so importation with the foreign package's read.xport function is pretty straightforward. but that makes merging the appropriate data sets trickier, since it might not be clear what to pull for which variables. for every analysis, start with the table with 'demo' in the name -- this file includes basic demographics, weighting, and complex sample survey design variables. since it's quick to download the files directly from the cdc's ftp site, there's no massive ftp download automation script. this new github repository co ntains five scripts: 2009-2010 interview only - download and analyze.R download, import, save the demographics and health insurance files onto your local computer load both files, limit them to the variables needed for the analysis, merge them together perform a few example variable recodes create the complex sample survey object, using the interview weights run a series of pretty generic analyses on the health insurance ques tions 2009-2010 interview plus laboratory - download and analyze.R download, import, save the demographics and cholesterol files onto your local computer load both files, limit them to the variables needed for the analysis, merge them together perform a few example variable recodes create the complex sample survey object, using the mobile examination component (mec) weights perform a direct-method age-adjustment and matc h figure 1 of this cdc cholesterol brief replicate 2005-2008 pooled cdc oral examination figure.R download, import, save, pool, recode, create a survey object, run some basic analyses replicate figure 3 from this cdc oral health databrief - the whole barplot replicate cdc publications.R download, import, save, pool, merge, and recode the demographics file plus cholesterol laboratory, blood pressure questionnaire, and blood pressure laboratory files match the cdc's example sas and sudaan syntax file's output for descriptive means match the cdc's example sas and sudaan synta x file's output for descriptive proportions match the cdc's example sas and sudaan syntax file's output for descriptive percentiles replicate human exposure to chemicals report.R (user-contributed) download, import, save, pool, merge, and recode the demographics file plus urinary bisphenol a (bpa) laboratory files log-transform some of the columns to calculate the geometric means and quantiles match the 2007-2008 statistics shown on pdf page 21 of the cdc's fourth edition of the report click here to view these five scripts for more detail about the national health and nutrition examination survey (nhanes), visit: the cdc's nhanes homepage the national cancer institute's page of nhanes web tutorials notes: nhanes includes interview-only weights and interview + mobile examination component (mec) weights. if you o nly use questions from the basic interview in your analysis, use the interview-only weights (the sample size is a bit larger). i haven't really figured out a use for the interview-only weights -- nhanes draws most of its power from the combination of the interview and the mobile examination component variables. if you're only using variables from the interview, see if you can use a data set with a larger sample size like the current population (cps), national health interview survey (nhis), or medical expenditure panel survey (meps) instead. confidential to sas, spss, stata, sudaan users: why are you still riding around on a donkey after we've invented the internal combustion engine? time to transition to r. :D

  4. d

    Current Population Survey (CPS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  5. H

    National Health Interview Survey (NHIS)

    • dataverse.harvard.edu
    Updated May 30, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Damico (2013). National Health Interview Survey (NHIS) [Dataset]. http://doi.org/10.7910/DVN/BYPZ8N
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 30, 2013
    Dataset provided by
    Harvard Dataverse
    Authors
    Anthony Damico
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    analyze the national health interview survey (nhis) with r the national health interview survey (nhis) is a household survey about health status and utilization. each annual data set can be used to examine the disease burden and access to care that individuals and families are currently experiencing across the country. check out the wikipedia article (ohh hayy i wrote that) for more detail about its current and potential uses. if you're cooking up a health-related analysis that doesn't need medical expenditures or monthly health insurance coverage, look at nhis before the medical expenditure panel survey (it's sample is twice as big). the centers for disease control and prevention (cdc) has been keeping nhis real since 1957, and the scripts below automate the download, importation, and analysis of every file back to 1963. what happened in 1997, you ask? scientists cloned dolly the sheep, clinton started his second term, and the national health interview survey underwent its most recent major questionnaire re-design. here's how all the moving parts work: a person-level file (personsx) that merges onto other files using unique household (hhx), family (fmx), and person (fpx) identifiers. [note to data historians: prior to 2004, person number was (px) and unique within each household.] this file includes the complex sample survey variables needed to construct a taylor-series linearization design, and should be used if your analysis doesn't require variables from the sample adult or sample c hild files. this survey setup generalizes to the noninstitutional, non-active duty military population. a family-level file that merges onto other files using unique household (hhx) and family (fmx) identifiers. a household-level file that merges onto other files using the unique household (hhx) identifier. a sample adult file that includes questions asked of only one adult within each household (selected at random) - a subset of the main person-level file. hhx, fmx, and fpx identifiers will merge with each of the files above, but since not every adult gets asked thes e questions, this file contains its own set of weights: wtfa_sa instead of wtfa. you can merge on whatever other variables you need from the three files above, but if your analysis requires any variables from the sample adult questionnaire, you can't use records in the person-level file that aren't also in the sample adult file (a big sample size cut). this survey setup generalizes to the noninstitutional, non-active duty military adult population. a sample child file that includes questions asked of only one child within each household (if available, and also selected at random) - another subset of the main person-level file. same deal as the sample adult description, except use wtfa_sc instead of wtfa oh yeah and this one generalizes to the child population. five imputed income files. if you want income and/or poverty variables incorporated into any part of your analysis, you'll need these puppies. the replication example below uses these, but if that's impenetrable, post in the comments describing where you get stuck. some injury stuff and other miscellanea that varies by year. if anyone uses this, please share your experience. if you use anything more than the personsx file alone, you'll need to merge some tables together. make sure you understand the difference between setting the parameter all = TRUE versus all = FALSE -- not everyone in the personsx file has a record in the samadult and sam child files. this new github repository contains four scripts: 1963-2011 - download all microdata.R loop through every year and download every file hosted on the cdc's nhis ftp site import each file into r with SAScii save each file as an r d ata file (.rda) download all the documentation into the year-specific directory 2011 personsx - analyze.R load the r data file (.rda) created by the download script (above) set up a taylor-series linearization survey design outlined on page 6 of this survey document perform a smattering of analysis examples 2011 personsx plus samadult with multiple imputation - analyze.R load the personsx and samadult r data files (.rda) created by the download script (above) merge the personsx and samadult files, highlighting how to conduct analyses that need both create tandem survey designs for both personsx-only and merg ed personsx-samadult files perform just a touch of analysis examples load and loop through the five imputed income files, tack them onto the personsx-samadult file conduct a poverty recode or two analyze the multiply-imputed survey design object, just like mom used to analyze replicate cdc tecdoc - 2000 multiple imputation.R download and import the nhis 2000 personsx and imputed income files, using SAScii and this imputed income sas importation script (no longer hosted on the cdc's nhis ftp site). loop through each of the five imputed income files, merging each to the personsx file and performing the same set of...

  6. d

    MERRA-2 subset for evaluation of renewables with merra2ools R-package:...

    • datadryad.org
    zip
    Updated Mar 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oleg Lugovoy; Shuo Gao (2021). MERRA-2 subset for evaluation of renewables with merra2ools R-package: 1980-2020 hourly, 0.5° lat x 0.625° lon global grid [Dataset]. http://doi.org/10.5061/dryad.v41ns1rtt
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 29, 2021
    Dataset provided by
    Dryad
    Authors
    Oleg Lugovoy; Shuo Gao
    Time period covered
    Mar 19, 2021
    Description

    The merra2ools dataset has been assembled through the following steps:

    The MERRA-2 collections tavg1_2d_flx_Nx (Surface Flux Diagnostics), tavg1_2d_rad_Nx (Radiation Diagnostics), and tavg1_2d_slv_Nx (Single-level atmospheric state variables) downloaded from NASA Goddard Earth Sciences (GES) Data and Information Services Center (DISC) (https://disc.gsfc.nasa.gov/datasets?project=MERRA-2) using GNU Wget network utility (https://disc.gsfc.nasa.gov/data-access). Every of the three collections consist of daily netCDF-4 files with 3-dimensional variables (lon x lat x hour). 
    The following variables obtained from the netCDF-4 files and merged into long-term time-series:
    
    
    
    Northward (V) and Eastward (U) wind at 10 and 50 meters (V10M, V50M, U10M, U50M, respectively), and 10-meter air temperature (T10M) from the tavg1_2d_slv_Nx collection;
    Incident shortwave land (SWGDN) and Surface albedo (ALBEDO) fro...
    
  7. Monitoring COVID-19 Impact on Refugees in Ethiopia: High-Frequency Phone...

    • microdata.unhcr.org
    • datacatalog.ihsn.org
    • +2more
    Updated Jul 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank-UNHCR Joint Data Center on Forced Displacement (JDC) (2022). Monitoring COVID-19 Impact on Refugees in Ethiopia: High-Frequency Phone Survey of Refugees 2020 - Ethiopia [Dataset]. https://microdata.unhcr.org/index.php/catalog/704
    Explore at:
    Dataset updated
    Jul 5, 2022
    Dataset provided by
    United Nations High Commissioner for Refugeeshttp://www.unhcr.org/
    World Bankhttps://www.worldbank.org/
    Authors
    World Bank-UNHCR Joint Data Center on Forced Displacement (JDC)
    Time period covered
    2020
    Area covered
    Ethiopia
    Description

    Abstract

    The high-frequency phone survey of refugees monitors the economic and social impact of and responses to the COVID-19 pandemic on refugees and nationals, by calling a sample of households every four weeks. The main objective is to inform timely and adequate policy and program responses. Since the outbreak of the COVID-19 pandemic in Ethiopia, two rounds of data collection of refugees were completed between September and November 2020. The first round of the joint national and refugee HFPS was implemented between the 24 September and 17 October 2020 and the second round between 20 October and 20 November 2020.

    Analysis unit

    Household

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sample was drawn using a simple random sample without replacement. Expecting a high non-response rate based on experience from the HFPS-HH, we drew a stratified sample of 3,300 refugee households for the first round. More details on sampling methodology are provided in the Survey Methodology Document available for download as Related Materials.

    Mode of data collection

    Computer Assisted Telephone Interview [cati]

    Research instrument

    The Ethiopia COVID-19 High Frequency Phone Survey of Refugee questionnaire consists of the following sections:

    • Interview Information
    • Household Roster
    • Camp Information
    • Knowledge Regarding the Spread of COVID-19
    • Behaviour and Social Distancing - Access to Basic Services
    • Employment
    • Income Loss
    • Coping/Shocks
    • Social Relations
    • Food Security
    • Aid and Support/ Social Safety Nets.

    A more detailed description of the questionnaire is provided in Table 1 of the Survey Methodology Document that is provided as Related Materials. Round 1 and 2 questionnaires available for download.

    Cleaning operations

    DATA CLEANING At the end of data collection, the raw dataset was cleaned by the Research team. This included formatting, and correcting results based on monitoring issues, enumerator feedback and survey changes. Data cleaning carried out is detailed below.

    Variable naming and labeling: • Variable names were changed to reflect the lowercase question name in the paper survey copy, and a word or two related to the question. • Variables were labeled with longer descriptions of their contents and the full question text was stored in Notes for each variable. • “Other, specify” variables were named similarly to their related question, with “_other” appended to the name. • Value labels were assigned where relevant, with options shown in English for all variables, unless preloaded from the roster in Amharic.

    Variable formatting: • Variables were formatted as their object type (string, integer, decimal, time, date, or datetime). • Multi-select variables were saved both in space-separated single-variables and as multiple binary variables showing the yes/no value of each possible response. • Time and date variables were stored as POSIX timestamp values and formatted to show Gregorian dates. • Location information was left in separate ID and Name variables, following the format of the incoming roster. IDs were formatted to include only the variable level digits, and not the higher-level prefixes (2-3 digits only.)
    • Only consented surveys were kept in the dataset, and all personal information and internal survey variables were dropped from the clean dataset. • Roster data is separated from the main data set and kept in long-form but can be merged on the key variable (key can also be used to merge with the raw data). • The variables were arranged in the same order as the paper instrument, with observations arranged according to their submission time.

    Backcheck data review: Results of the backcheck survey are compared against the originally captured survey results using the bcstats command in Stata. This function delivers a comparison of variables and identifies any discrepancies. Any discrepancies identified are then examined individually to determine if they are within reason.

    Data appraisal

    The following data quality checks were completed: • Daily SurveyCTO monitoring: This included outlier checks, skipped questions, a review of “Other, specify”, other text responses, and enumerator comments. Enumerator comments were used to suggest new response options or to highlight situations where existing options should be used instead. Monitoring also included a review of variable relationship logic checks and checks of the logic of answers. Finally, outliers in phone variables such as survey duration or the percentage of time audio was at a conversational level were monitored. A survey duration of close to 15 minutes and a conversation-level audio percentage of around 40% was considered normal. • Dashboard review: This included monitoring individual enumerator performance, such as the number of calls logged, duration of calls, percentage of calls responded to and percentage of non-consents. Non-consent reason rates and attempts per household were monitored as well. Duration analysis using R was used to monitor each module's duration and estimate the time required for subsequent rounds. The dashboard was also used to track overall survey completion and preview the results of key questions. • Daily Data Team reporting: The Field Supervisors and the Data Manager reported daily feedback on call progress, enumerator feedback on the survey, and any suggestions to improve the instrument, such as adding options to multiple choice questions or adjusting translations. • Audio audits: Audio recordings were captured during the consent portion of the interview for all completed interviews, for the enumerators' side of the conversation only. The recordings were reviewed for any surveys flagged by enumerators as having data quality concerns and for an additional random sample of 2% of respondents. A range of lengths were selected to observe edge cases. Most consent readings took around one minute, with some longer recordings due to questions on the survey or holding for the respondent. All reviewed audio recordings were completed satisfactorily. • Back-check survey: Field Supervisors made back-check calls to a random sample of 5% of the households that completed a survey in Round 1. Field Supervisors called these households and administered a short survey, including (i) identifying the same respondent; (ii) determining the respondent's position within the household; (iii) confirming that a member of the the data collection team had completed the interview; and (iv) a few questions from the original survey.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Michael Kempf; Michael Kempf (2023). A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.10396148
Organization logo

Data from: A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19

Related Article
Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Dec 16, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael Kempf; Michael Kempf
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered
Dec 16, 2023
Area covered
Levant
Description

Overview

This dataset is the repository for the following paper submitted to Data in Brief:

Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).

The Data in Brief article contains the supplement information and is the related data paper to:

Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).

Description/abstract

The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.

Folder structure

The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:

“code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.

“MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.

“mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).

“yield_productivity” contains .csv files of yield information for all countries listed above.

“population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).

“GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.

“built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.

Code structure

1_MODIS_NDVI_hdf_file_extraction.R


This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.


2_MERGE_MODIS_tiles.R


In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").


3_CROP_MODIS_merged_tiles.R


Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS.
The repository provides the already clipped and merged NDVI datasets.


4_TREND_analysis_NDVI.R


Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.
To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.


5_BUILT_UP_change_raster.R


Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.


6_POPULATION_numbers_plot.R


For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.


7_YIELD_plot.R


In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.


8_GLDAS_read_extract_trend


The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection).
Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.
From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).
From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.

Search
Clear search
Close search
Google apps
Main menu