This SOils DAta Harmonization (SoDaH) database is designed to bring together soil carbon data from diverse research networks into a harmonized dataset that can be used for synthesis activities and model development. The research network sources for SoDaH span different biomes and climates, encompass multiple ecosystem types, and have collected data across a range of spatial, temporal, and depth gradients. The rich data sets assembled in SoDaH consist of observations from monitoring efforts and long-term ecological experiments. The SoDaH database also incorporates related environmental covariate data pertaining to climate, vegetation, soil chemistry, and soil physical properties. The data are harmonized and aggregated using open-source code that enables a scripted, repeatable approach for soil data synthesis.
Public data used for data harmonization.
This dataset is associated with the following publication: Uhran, B., L. Windham-Myers, N. Bliss, A. Nahlik, E. Sundquist, and C. Stagg. Improved Wetland Soil Organic Carbon Stocks of the Conterminous U.S. Through Data Harmonization. Frontiers in Soil Science. Frontiers, Lausanne, SWITZERLAND, 1: 706701, (2021).
The program PanTool was developed as a tool box like a Swiss Army Knife for data conversion and recalculation, written to harmonize individual data collections to standard import format used by PANGAEA. The format of input files the program PanTool needs is a tabular saved in plain ASCII. The user can create this files with a spread sheet program like MS-Excel or with the system text editor. PanTool is distributed as freeware for the operating systems Microsoft Windows, Apple OS X and Linux.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an updated version of the original study protocol under the title “Negative Affectivity Data Harmonization” that was pre-registered in OSF on September 4th, 2022 (osf.io/kqsn9).
The integration of proteomic datasets, generated by non-cooperating laboratories using different LC-MS/MS setups can overcome limitations in statistically underpowered sample cohorts but has not been demonstrated to this day. In proteomics, differences in sample preservation and preparation strategies, chromatography and mass spectrometry approaches and the used quantification strategy distort protein abundance distributions in integrated datasets. The Removal of these technical batch effects requires setup-specific normalization and strategies that can deal with missing at random (MAR) and missing not at random (MNAR) type values at a time. Algorithms for batch effect removal, such as the ComBat-algorithm, commonly used for other omics types, disregard proteins with MNAR missing values and reduce the informational yield and the effect size for combined datasets significantly. Here, we present a strategy for data harmonization across different tissue preservation techniques, LC-MS/MS instrumentation setups and quantification approaches. To enable batch effect removal without the need for data reduction or error-prone imputation we developed an extension to the ComBat algorithm, ´ComBat HarmonizR, that performs data harmonization with appropriate handling of MAR and MNAR missing values by matrix dissection The ComBat HarmonizR based strategy enables the combined analysis of independently generated proteomic datasets for the first time. Furthermore, we found ComBat HarmonizR to be superior for removing batch effects between different Tandem Mass Tag (TMT)-plexes, compared to commonly used internal reference scaling (iRS). Due to the matrix dissection approach without the need of data imputation, the HarmonizR algorithm can be applied to any type of -omics data while assuring minimal data loss
ST_LUCAS is a harmonized dataset derived from the LUCAS (Land Use and Coverage Area frame Survey) dataset. LUCAS is an Eurostat activity that has performed repeated in situ surveys over Europe every three years since 2006. Original LUCAS data (https://ec.europa.eu/eurostat/web/lucas/data) starting with the 2006 survey were harmonized into common nomenclature based on the 2018 survey. ST_LUCAS dataset is provided in two versions: lucas_points: each LUCAS survey is represented by single record lucas_st_points: each LUCAS point is represented by a single location calculated from multiple surveys and by a set of harmonized attributes for each survey year Harmonization and space-aggregation of LUCAS data were performed by ST_LUCAS system available from https://geoforall.fsv.cvut.cz/st_lucas. The methodology is described in Landa, M.; Brodský, L.; Halounová, L.; Bouček, T.; Pešek, O. Open Geospatial System for LUCAS In Situ Data Harmonization and Distribution. ISPRS Int. J. Geo-Inf. 2022, 11, 361. https://doi.org/10.3390/ijgi11070361. List of harmonized LUCAS attributes: https://geoforall.fsv.cvut.cz/st_lucas/tables/list_of_attributes.html ST_LUCAS dataset is provided under the same conditions (“free of charge”) as the original LUCAS data (https://ec.europa.eu/eurostat/web/lucas/data). This work is co-financed under Grant Agreement Connecting Europe Facility (CEF) Telecom project 2018-EU-IA-0095 by the European Union.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This deposit contains the taxonomy maps and data we used to translate data on COVID-19 government responses from 7 different datasets into taxonomy developed by the CoronaNet Research Project (CoronaNet; Cheng et al 2020). These taxonomy maps form the basis of our efforts to harmonize this data into the CoronaNet database. The following taxonomy maps are deposited in the 'Taxonomy' folder:ACAPS COVID-19 Government Measures - CoronaNet Taxonomy Map Canadian Data Set of COVID-19 Interventions from the Canadian Institute for Health Information (CIHI) - CoronaNet Taxonomy Map COVID Analysis and Maping of Policies (COVID AMP) - CoronaNet Taxonomy Map Johns Hopkins Health Intervention Tracking for COVID-19 (HIT-COVID) - CoronaNet Taxonomy Map Oxford Covid-19 Government Response Tracker (OxCGRT) - CoronaNet Taxonomy Map World Health Organisation Public Health and Safety Measures (WHO PHSM) - CoronaNet Taxonomy MapMeanwhile the 'Data' folder contains the raw and mapped data for each external dataset (i.e. ACAPS, CIHI, COVID AMP, HIT-COVID, OxCGRT and WHO PHSM) as well as the combined external data for Steps 1 and 3 of the data harmonization process described in Cheng et al (2023) 'Harmonizing Government Responses to the COVID-19 Pandemic.'
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A detailed overview of the results of the literature search, including the data extraction matrix can be found in the Additional file 1.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Harmonized Income Dataset provides harmonized individual-level survey variables on personal and household income from 19 major cross-national survey projects, as well as technical variables necessary to match them to the Survey Data Recycling Master File version 1 (SDR v.1, DOI:10.7910/DVN/VWGF5Q), which contains harmonized survey items on political participation, political attitudes, as well as their selected correlates.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This document outlines the creation of a global inventory of reference samples and Earth Observation (EO) / gridded datasets for the Global Pasture Watch (GPW) initiative. This inventory supports the training and validation of machine-learning models for GPW grassland mapping. This documentation outlines methodology, data sources, workflow, and results.
Keywords: Grassland, Land Use, Land Cover, Gridded Datasets, Harmonization
Create a global inventory of existing reference samples for land use and land cover (LULC);
Compile global EO / gridded datasets that capture LULC classes and harmonize them to match the GPW classes;
Develop automated scripts for data harmonization and integration.
Datasets incorporated:
Datasets |
Spatial distribution | Time period | Number of individual samples |
WorldCereal | Global | 2016-2021 | 38,267,911 |
Global Land Cover Mapping and Estimation (GLanCE) | Global | 1985-2021 | 31,061,694 |
EuroCrops | Europe | 2015-2022 | 14,742,648 |
GeoWiki G-GLOPS training dataset | Global | 2021 | 11,394,623 |
MapBiomas Brazil | Brazil | 1985-2018 | 3,234,370 |
Land Use/Land Cover Area Frame Survey (LUCAS) | Europe | 2006-2018 | 1,351,293 |
Dynamic World | Global | 2019-2020 | 1,249,983 |
Land Change Monitoring, Assessment, and Projection (LCMap) | U.S. (CONUS) | 1984-2018 | 874,836 |
GeoWiki 2012 | Global | 2011-2012 | 151,942 |
PREDICTS | Global | 1984-2013 | 16,627 |
CropHarvest | Global | 2018-2021 | 9,714 |
Total: 102,355,642 samples
We harmonized global reference samples and EO/gridded datasets to align with GPW classes, optimizing their integration into the GPW machine-learning workflow.
We considered reference samples derived by visual interpretation with spatial support of at least 30 m (Landsat and Sentinel), that could represent LULC classes for a point or region.
Each dataset was processed using automated Python scripts to download vector files and convert the original LULC classes into the following GPW classes:
0. Other land cover
1. Natural and Semi-natural grassland
2. Cultivated grassland
3. Crops and other related agricultural practices
We empirically assigned a weight to each sample based on the original dataset's class description, reflecting the level of mixture within the class. The weights range from 1 (Low) to 3 (High), with higher weights indicating greater mixture. Samples with low mixture levels are more accurate and effective for differentiating typologies and for validation purposes.
The harmonized dataset includes these columns:
Attribute Name | Definition |
dataset_name | Original dataset name |
reference_year | Reference year of samples from the original dataset |
original_lulc_class | LULC class from the original dataset |
gpw_lulc_class | Global Pasture Watch LULC class |
sample_weight | Sample's weight based on the mixture level within the original LULC class |
The development of this global inventory of reference samples and EO/gridded datasets relied on valuable contributions from various sources. We would like to express our sincere gratitude to the creators and maintainers of all datasets used in this project.
Brown, C.F., Brumby, S.P., Guzder-Williams, B. et al. Dynamic World, Near real-time global 10 m land use land cover mapping. Sci Data 9, 251 (2022). https://doi.org/10.1038/s41597-022-01307-4Van Tricht, K. et al. Worldcereal: a dynamic open-source system for global-scale, seasonal, and reproducible crop and irrigation mapping. Earth Syst. Sci. Data 15, 5491–5515, 10.5194/essd-15-5491-2023 (2023)
Buchhorn, M.; Smets, B.; Bertels, L.; De Roo, B.; Lesiv, M.; Tsendbazar, N.E., Linlin, L., Tarko, A. (2020): Copernicus Global Land Service: Land Cover 100m: Version 3 Globe 2015-2019: Product User Manual; Zenodo, Geneve, Switzerland, September 2020; doi: 10.5281/zenodo.3938963
d’Andrimont, R. et al. Harmonised lucas in-situ land cover and use database for field surveys from 2006 to 2018 in the european union. Sci. data 7, 352, 10.1038/s41597-019-0340-y (2020)
Fritz, S. et al. Geo-Wiki: An online platform for improving global land cover, Environmental Modelling & Software, 31, https://doi.org/10.1016/j.envsoft.2011.11.015 (2012)
Fritz, S., See, L., Perger, C. et al. A global dataset of crowdsourced land cover and land use reference data. Sci Data 4, 170075 https://doi.org/10.1038/sdata.2017.75 (2017)
Schneider, M., Schelte, T., Schmitz, F. & Körner, M. Eurocrops: The largest harmonized open crop dataset across the european union. Sci. Data 10, 612, 10.1038/s41597-023-02517-0 (2023)
Souza, C. M. et al. Reconstructing Three Decades of Land Use and Land Cover Changes in Brazilian Biomes with Landsat Archive and Earth Engine. Remote. Sens. 12, 2735, 10.3390/rs12172735 (2020)
Stanimirova, R. et al. A global land cover training dataset from 1984 to 2020. Sci. Data 10, 879 (2023)
Tsendbazar, N. et al. Product validation report (d12-pvr) v 1.1 (2021).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Changes since the last version: in the .csv export there was a naming problem.
visit_concert
: This is a standard CAP variables about visiting frequencies, in numeric form. fct_visit_concert
: This is a standard CAP variables about visiting frequencies, in categorical form. is_visit_concert
: binary variable, 0 if the person had not visited concerts in the previous 12 months.artistic_activity_played_music
: A variable of the frequency of playing music as an amateur or professional practice, in some surveys we have only a binary variable (played in the last 12 months or not) in other we have frequencies. We will convert this into a binary variable. fct_artistic_activity_played_music
: The artistic_activity_played_music
in categorical representation.artistic_activity_sung
: A variable of the frequency of singing as an amateur or professional practice, like played_muisc. Because of the liturgical use of singing, and the differences of religious practices among countries and gender, this is a significantly different variable from played_music.fct_artistic_activity_sung
: The artistic_activity_sung
variable in categorical representation.age_exact
: The respondent’s age as an integer number. country_code
: an ISO country codegeo
: an ISO code that separates Germany to the former East and West Germany, and the United Kingdom to Great Britain and Northern Ireland, and Cyprus to Cyprus and the Turiksh Cypriot community.[we may leave Turkish Cyprus out for practical reasons.]age_education
: This is a harmonized education proxy. Because we work with the data of more than 30 countries, education levels are difficult to harmonize, and we use the Eurobarometer standard proxy, age of leaving education. It is a specially coded variable, and we will re-code them into two variables, age_education
and is_student
. is_student
: is a dummy variable for the special coding in age_education for “still studying”, i.e. the person does not have yet a school leaving age. It would be tempting to impute age
in this case to age_education
, but we will show why this is not a good strategy.w
, w1
: Post-stratification weights for the 15+ years old population of each country. Use w1
for averages of geo
entities treating Northern Ireland, Great Britain, the United Kingdom, the former GDR, the former West Germany, and Germany as geographical areas. Use w
when treating the United Kingdom and Germany as one territory.wex
: Projected weight variable. For weighted average values, use w
, w1
, for projections on the population size, i.e., use with sums, use wex
.id
: The identifier of the original survey.rowid
`: A new unique identifier that is unique in all harmonized surveys, i.e., remains unique in the harmonized dataset.Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The reference data (SOC) of the Swedish soil samples are part of the Swedish national soil monitoring programme for agricultural soils, Soil and crop inventory and are owned by The Swedish Environmental Protection Agency. Spectra were collected within the EJP SOIL project
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contains supplementary marker gene information. (XLS 117 kb)
Goal: Setting up a pipeline for extending, improving and visualizing time series of municipality characteristics by means of data harmonization and linkage of historical and contemporary dataseries using Linked Data technologies (RDF).This project focused on increasing the data availability, data quality and visualization of characteristics of Dutch municipalities for the period 1795-2010. We did so by (1) combining data from historical and contemporary time series, (2) evaluating and improving on the quality of these time series, and (3) extending the availability of NLGIS maps for the last two decades in order to visualize municipality characteristics for two centuries.
A dataset within the Harmonized Database of Western U.S. Water Rights (HarDWR). For a detailed description of the database, please see the meta-record v2.0. Changelog v2.0 - Recalculated based on data sourced from WestDAAT - Changed using a Site ID column to identify unique records to using aa combination of Site ID and Allocation ID - Removed the Water Management Area (WMA) column from the harmonized records. The replacement is a separate file which stores the relationship between allocations and WMAs. This allows for allocations to contribute to water right amounts to multiple WMAs during the subsequent cumulative process. - Added a column describing a water rights legal status - Added "Unspecified" was a water source category - Added an acre-foot (AF) column - Added a column for the classification of the right's owner v1.02 - Added a .RData file to the dataset as a convenience for anyone exploring our code. This is an internal file, and the one referenced in analysis scripts as the data objects are already in R data objects. v1.01 - Updated the names of each file with an ID number less than 3 digits to include leading 0s v1.0 - Initial public release Description Heremore » we present an updated database of Western U.S. water right records. This database provides consistent unique identifiers for each water right record, and a consistent categorization scheme that puts each water right record into one of seven broad use categories. These data were instrumental in conducting a study of the multi-sector dynamics of inter-sectoral water allocation changes though water markets (Grogan et al., in review). Specifically, the data were formatted for use as input to a process-based hydrologic model, Water Balance Model (WBM), with a water rights module (Grogan et al., in review). While this specific study motivated the development of the database presented here, water management in the U.S. West is a rich area of study (e.g., Anderson and Woosly, 2005; Tidwell, 2014; Null and Prudencio, 2016; Carney et al., 2021) so releasing this database publicly with documentation and usage notes will enable other researchers to do further work on water management in the U.S. West. We produced the water rights database presented here in four main steps: (1) data collection, (2) data quality control, (3) data harmonization, and (4) generation of cumulative water rights curves. Each of steps (1)-(3) had to be completed in order to produce (4), the final product that was used in the modeling exercise in Grogan et al. (in review). All data in each step is associated with a spatial unit called a Water Management Area (WMA), which is the unit of water right administration utilized by the state in which the right came from. Steps (2) and (3) required use to make assumptions and interpretation, and to remove records from the raw data collection. We describe each of these assumptions and interpretations below so that other researchers can choose to implement alternative assumptions an interpretation as fits their research aims. Motivation for Changing Data Sources The most significant change has been a switch from collecting the raw water rights directly from each state to using the water rights records presented in WestDAAT, a product of the Water Data Exchange (WaDE) Program under the Western States Water Council (WSWC). One of the main reasons for this is that each state of interest is a member of the WSWC, meaning that WaDE is partially funded by these states, as well as many universities. As WestDAAT is also a database with consistent categorization, it has allowed us to spend less time on data collection and quality control and more time on answering research questions. This has included records from water right sources we had previously not known about when creating v1.0 of this database. The only major downside to utilizing the WestDAAT records as our raw data is that further updates are tied to when WestDAAT is updated, as some states update their public water right records daily. However, as our focus is on cumulative water amounts at the regional scale, it is unlikely most records updates would have a significant effect on our results. The structure of WestDAAT led to several important changes to how HarWR is formatted. The most significant change is that WaDE has calculated a field known as SiteUUID
, which is a unique identifier for the Point of Diversion (POD), or where the water is drawn from. This separate from AllocationNativeID
, which is the identifier for the allocation of water, or the amount of water associated with the water right. It should be noted that it is possible for a single site to have multiple allocations associated with it and for an allocation to be able to be extracted from multiple sites. The site-allocation structure has allowed us to adapt a more consistent, and hopefully more realistic, approach in organizing the water right records than we had with HarDWR v1.0. This was incredibly helpful as the raw data from many states had multiple water uses within a single field within a single row of their raw data, and it was not always clear if the first water use was the most important, or simply first alphabetically. WestDAAT has already addressed this data quality issue. Furthermore, with v1.0, when there were multiple records with the same water right ID, we selected the largest volume or flow amount and disregarded the rest. As WestDAAT was already a common structure for disparate data formats, we were better able to identify sites with multiple allocations and, perhaps more importantly, allocations with multiple sites. This is particularly helpful when an allocation has sites which cross WMA boundaries, instead of just assigning the full water amount to a single WMA we are now able to divide the amount of water between the number of relevant WMAs. As it is now possible to identify allocations with water used in multiple WMAs, it is no longer practical to store this information within a single column. Instead the stAllocationToWMATab.csv file was created, which is an allocation by WMA matrix containing the percent Place of Use area overlap with each WMA. We then use this percentage to divide the allocation's flow amount between the given WMAs during the cumulation process to hopefully provide more realistic totals of water use in each area. However, not every state provides areas of water use, so like HarDWR v1.0, a hierarchical decision tree was used to assign each allocation to a WMA. First, if a WMA could be identified based on the allocation ID, then that WMA was used; typically, when available, this applied to the entire state and no further steps were needed. Second was the spatial analysis of Place of Use to WMAs. Third was a spatial analysis of the POD locations to WMAs, with the assumption that allocation's POD is within the WMA it should belong to; if an allocation still had multiple WMAs based on its POD locations, then the allocation's flow amount would be divided equally between all WMAs. The fourth, and final, process was to include water allocations which spatially fell outside of the state WMA boundaries. This could be due to several reasons, such as coordinate errors / imprecision in the POD location, imprecision in the WMA boundaries, or rights attached with features, such as a reservoir, which crosses state boundaries. To include these records, we decided for any POD which was within one kilometer of the state's edge would be assigned to the nearest WMA. Other Changes WestDAAT has Allowed In addition to a more nuanced and consistent method of assigning water right's data to WMAs, there are other benefits gained from using the WestDAAT dataset. Among those is a consistent categorization of a water right's legal status. In HarDWR v1.0, legal status was effectively ignored, which led to many valid concerns about the quality of the database related to the amounts of water the rights allowed to be claimed. The main issue was that rights with legal status' such as "application withdrawn", "non-active", or "cancelled" were included within HarDWR v1.0. These, and other water rights status' which were deemed to not be in use have been removed from this version of the database. Another major change has been the addition of the "unspecified water source category. This is water that can come from either surface water or groundwater, or the source of which is unknown. The addition of this source category brings the total number of categories to three. Due to reviewer feedback, we decided to add the acre-foot (AF) column so that the data may be more applicable to a wider audience. We added the ownerClassification column so that the data may be more applicable to a wider audience. File Descriptions The dataset is a series of various files organized by state sub-directories. In addition, each file begins with the state's name, in case the file is separate from its sub-directory for some reason. After the state name is the text which describes the contents of the file. Here is each file described in detail. Note that st is a placeholder for the state's name. stFullRecords_HarmonizedRights.csv: A file of the complete water records for each state. The column headers for each of this type of file are: state - The name of the state to which the allocations belong to. FIPS - The two digit numeric state ID code. siteID - The site location ID for POD locations. A site may have multiple allocations, which are the actual amount of water which can be drawn. In a simplified hypothetical, a farm stead may have an allocation for "irrigation" and an allocation for "domestic" water use, but the water is drawn from the same pumping equipment. It should be noted that many of the site ID appear to have been added by WaDE, and therefore may not be recognized by a given state's water rights database. allocationID - The allocation ID for the water right. For most states this is the water right ID, and what is
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
CMIP6 Forcing Datasets (input4MIPs). These data include all datasets published for 'input4MIPs.CMIP6.ScenarioMIP.UofMD.UofMD-landState-MAGPIE-ssp534-2-1-f' with the full Data Reference Syntax following the template 'activity_id.mip_era.target_mip.institution_id.source_id.realm.frequency.variable_id.grid_label'.
The model UofMD-landState-MAGPIE-ssp534-2-1-f (UofMD-landState-MAGPIE-ssp534-2-1-f) was run by the UofMD (UofMD) in native nominal resolutions: unknown.
Project: The forcing datasets (and boundary conditions) needed for CMIP6 experiments are being prepared by a number of different experts. Initially many of these datasets may only be available from those experts, but over time as part of the 'input4MIPs' activity most of them will be archived by PCMDI and served by the Earth System Grid Federation (https://esgf-node.llnl.gov/search/input4mips/ ). More information is available in the living document: http://goo.gl/r8up31 .
The datasets in the .pdf and .zip attached to this record are in support of Intelligent Transportation Systems Joint Program Office (ITS JPO) report FHWA-JPO-15-222, "Impacts Assessment of Dynamic Speed Harmonization with Queue Warning: Task 3, Impacts Assessment Report". The files in these zip files are specifically related to the US-101 Testbed, near San Mateo, CA. The uncompressed and compressed files total 2.0265 GB in size. The files have been uploaded as-is; no further documentation was supplied by NTL. All located .docx files were converted to .pdf document files which are an open, archival format. These .pdfs were then added to the zip file alongside the original .docx files. The attached zip files can be unzipped using any zip compression/decompression software. These zip file contains files in the following formats: .pdf document files which can be read using any pdf reader; .xlsxm macro-enabled spreadsheet files which can be read in Microsoft Excel and some Tech Report spreadsheet programs; .accdb database files which may be opened with Microsoft Access Database software and Tech Report open database software applications ; as well as .db generic database files, often associated with thumbnail images in the Windows operating environment. [software requirements] These files were last accessed in 2017. File and .zip file names include: FHWA_JPO_15_222_INFLO_Performance_Measure_METADATA.pdf ; FHWA_JPO_15_222_INFLO_Performance_Measure_METADATA.docx ; FHWA_JPO_15_222_INFLO_VISSIM_Output_and_Analysis_Spreadsheets.zip ; FHWA_JPO_15_222_INFLO_Spreadsheet_PDFs.zip ; FHWA_JPO_15_222_DATA_CV50.zip ; and, FHWA_JPO_15_222_DATA_CV25.zip
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN.
The Department of Statistics (DOS) carried out four rounds of the 2007 Employment and Unemployment Survey (EUS) during February, May, August and November 2007. The survey rounds covered a total sample of about fifty three thousand households Nation-wide. The sampled households were selected using a stratified multi-stage cluster sampling design. It is noteworthy that the sample represents the national level (Kingdom), governorates, the three Regions (Central, North and South), and the urban/rural areas.
The importance of this survey lies in that it provides a comprehensive data base on employment and unemployment that serves decision makers, researchers as well as other parties concerned with policies related to the organization of the Jordanian labor market.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a sample representative on the national level (Kingdom), governorates, the three Regions (Central, North and South), and the urban/rural areas.
1- Household/family. 2- Individual/person.
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN
The sample of this survey is based on the frame provided by the data of the Population and Housing Census, 2004. The Kingdom was divided into strata, where each city with a population of 100,000 persons or more was considered as a large city. The total number of these cities is 6. Each governorate (except for the 6 large cities) was divided into rural and urban areas. The rest of the urban areas in each governorate was considered as an independent stratum. The same was applied to rural areas where it was considered as an independent stratum. The total number of strata was 30.
In view of the existing significant variation in the socio-economic characteristics in large cities in particular and in urban in general, each stratum of the large cities and urban strata was divided into four sub-stratum according to the socio- economic characteristics provided by the population and housing census with the purpose of providing homogeneous strata.
The frame excludes collective dwellings, However, it is worth noting that the collective households identified in the harmonized data, through a variable indicating the household type, are those reported without heads in the raw data, and in which the relationship of all household members to head was reported "other".
This sample is also not representative for the non-Jordanian population.
The sample of this survey was designed, using the two-stage cluster stratified sampling method, based on the data of the population and housing census 2004 for carrying out household surveys. The sample is representative on the Kingdom, rural-urban regions and governorates levels. The total sample size for each round was 1336 Primary Sampling Units (PSUs) (clusters). These units were distributed to urban and rural regions in the governorates, in addition to the large cities in each governorate according to the weight of persons and households, and according to the variance within each stratum. Slight modifications regarding the number of these units were made to cope with the multiple of 8, the number of clusters for four rounds was 5344.
The main sample consists of 40 replicates, each replicate consists of 167 PSUs. For the purpose of each round, eight replicates of the main sample were used. The PSUs were ordered within each stratum according to geographic characteristics and then according to socio-economic characteristics in order to ensure good spread of the sample. Then, the sample was selected on two stages. In the first stage, the PSUs were selected using the Probability Proportionate to Size with systematic selection procedure. The number of households in each PSU served as its weight or size. In the second stage, the blocks of the PSUs (cluster) which were selected in the first stage have been updated. Then a constant number of households (10 households) was selected, using the random systematic sampling method as final PSUs from each PSU (cluster).
It is noteworthy that the sample of the present survey does not represent the non-Jordanian population, due to the fact that it is based on households living in conventional dwellings. In other words, it does not cover the collective households living in collective dwellings. Therefore, the non-Jordanian households covered in the present survey are either private households or collective households living in conventional dwellings.
Face-to-face [f2f]
The plan of the tabulation of survey results was guided by former Employment and Unemployment Surveys which were previously prepared and tested. The final survey report was then prepared to include all detailed tabulations as well as the methodology of the survey.
https://fred.stlouisfed.org/legal/#copyright-citation-requiredhttps://fred.stlouisfed.org/legal/#copyright-citation-required
Graph and download economic data for Harmonized Index of Consumer Prices: All-Items HICP for Czech Republic (CP0000CZM086NEST) from Jan 1996 to Jun 2025 about Czech Republic, harmonized, all items, CPI, price index, indexes, and price.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Goal: Setting up a pipeline for extending, improving and visualizing time series of municipality characteristics by means of data harmonization and linkage of historical and contemporary dataseries using Linked Data technologies (RDF).This project focused on increasing the data availability, data quality and visualization of characteristics of Dutch municipalities for the period 1795-2010. We did so by (1) combining data from historical and contemporary time series, (2) evaluating and improving on the quality of these time series, and (3) extending the availability of NLGIS maps for the last two decades in order to visualize municipality characteristics for two centuries.
This SOils DAta Harmonization (SoDaH) database is designed to bring together soil carbon data from diverse research networks into a harmonized dataset that can be used for synthesis activities and model development. The research network sources for SoDaH span different biomes and climates, encompass multiple ecosystem types, and have collected data across a range of spatial, temporal, and depth gradients. The rich data sets assembled in SoDaH consist of observations from monitoring efforts and long-term ecological experiments. The SoDaH database also incorporates related environmental covariate data pertaining to climate, vegetation, soil chemistry, and soil physical properties. The data are harmonized and aggregated using open-source code that enables a scripted, repeatable approach for soil data synthesis.