78 datasets found
  1. f

    Additional file 1 of Conceptual design of a generic data harmonization...

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Feb 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zoch, Michele; Peng, Yuan; Reinecke, Ines; Henke, Elisa; Sedlmayr, Martin; Bathelt, Franziska (2024). Additional file 1 of Conceptual design of a generic data harmonization process for OMOP common data model [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001502363
    Explore at:
    Dataset updated
    Feb 27, 2024
    Authors
    Zoch, Michele; Peng, Yuan; Reinecke, Ines; Henke, Elisa; Sedlmayr, Martin; Bathelt, Franziska
    Description

    A detailed overview of the results of the literature search, including the data extraction matrix can be found in the Additional file 1.

  2. o

    CoronaNet COVID-19 Policy Responses: Taxonomy Maps and Data for Data...

    • openicpsr.org
    delimited
    Updated Nov 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cindy Cheng; Luca Messerschmidt; Isaac Bravo; Marco Waldbauer; Rohan Bhavikatti; Caress Schenk; Vanja Grujic; Timothy Model; Robert Kubinec; Joan Barceló (2023). CoronaNet COVID-19 Policy Responses: Taxonomy Maps and Data for Data Harmonization [Dataset]. http://doi.org/10.3886/E195081V2
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Nov 11, 2023
    Dataset provided by
    Delve
    New York University Abu Dhabi
    Universidade de Brasília
    Technical University of Munich
    Nazarbayev University,
    Authors
    Cindy Cheng; Luca Messerschmidt; Isaac Bravo; Marco Waldbauer; Rohan Bhavikatti; Caress Schenk; Vanja Grujic; Timothy Model; Robert Kubinec; Joan Barceló
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 31, 2019 - Sep 21, 2021
    Area covered
    World
    Description

    This deposit contains the taxonomy maps and data we used to translate data on COVID-19 government responses from 7 different datasets into taxonomy developed by the CoronaNet Research Project (CoronaNet; Cheng et al 2020). These taxonomy maps form the basis of our efforts to harmonize this data into the CoronaNet database. The following taxonomy maps are deposited in the 'Taxonomy' folder:ACAPS COVID-19 Government Measures - CoronaNet Taxonomy Map Canadian Data Set of COVID-19 Interventions from the Canadian Institute for Health Information (CIHI) - CoronaNet Taxonomy Map COVID Analysis and Maping of Policies (COVID AMP) - CoronaNet Taxonomy Map Johns Hopkins Health Intervention Tracking for COVID-19 (HIT-COVID) - CoronaNet Taxonomy Map Oxford Covid-19 Government Response Tracker (OxCGRT) - CoronaNet Taxonomy Map World Health Organisation Public Health and Safety Measures (WHO PHSM) - CoronaNet Taxonomy MapMeanwhile the 'Data' folder contains the raw and mapped data for each external dataset (i.e. ACAPS, CIHI, COVID AMP, HIT-COVID, OxCGRT and WHO PHSM) as well as the combined external data for Steps 1 and 3 of the data harmonization process described in Cheng et al (2023) 'Harmonizing Government Responses to the COVID-19 Pandemic.'

  3. Harmonization of sediment diatoms from hundreds of lakes in the northeastern...

    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • datasets.ai
    • +2more
    Updated Sep 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2022). Harmonization of sediment diatoms from hundreds of lakes in the northeastern United States [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/harmonization-of-sediment-diatoms-from-hundreds-of-lakes-in-the-northeastern-united-states
    Explore at:
    Dataset updated
    Sep 13, 2022
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    Northeastern United States, United States
    Description

    Sediment diatoms are widely used to track environmental histories of lakes and their watersheds, but merging datasets generated by different researchers for further large-scale studies is challenging because of the taxonomic discrepancies caused by rapidly evolving diatom nomenclature and taxonomic concepts. Here we collated five datasets of lake sediment diatoms from the northeastern USA using a harmonization process which included updating synonyms, tracking the identity of inconsistently identified taxa and grouping those that could not be resolved taxonomically. The Dataset consists of a Portable Document Format (.pdf) file of the Voucher Flora, six Microsoft Excel (.xlsx) data files, an R script, and five output Comma Separated Values (.csv) files. The Voucher Flora documents the morphological species concepts in the dataset using diatom images compiled into plates (NE_Lakes_Voucher_Flora_102421.pdf) and the translation scheme of the OTU codes to diatom scientific or provisional names with identification sources, references, and notes (VoucherFloraTranslation_102421.xlsx). The file Slide_accession_numbers_102421.xlsx has slide accession numbers in the ANS Diatom Herbarium. The “DiatomHarmonization_032222_files for R.zip” archive contains four Excel input data files, the R code, and a subfolder “OUTPUT” with five .csv files. The file Counts_original_long_102421.xlsx contains original diatom count data in long format. The file Harmonization_102421.xlsx is the taxonomic harmonization scheme with notes and references. The file SiteInfo_031922.xlsx contains sampling site- and sample-level information. WaterQualityData_021822.xlsx is a supplementary file with water quality data. R code (DiatomHarmonization_032222.R) was used to apply the harmonization scheme to the original diatom counts to produce the output files. The resulting output files are five wide format files containing diatom count data at different harmonization steps (Counts_1327_wide.csv, Step1_1327_wide.csv, Step2_1327_wide.csv, Step3_1327_wide.csv) and the summary of the Indicator Species Analysis (INDVAL_RESULT.csv). The harmonization scheme (Harmonization_102421.xlsx) can be further modified based on additional taxonomic investigations, while the associated R code (DiatomHarmonization_032222.R) provides a straightforward mechanism to diatom data versioning. This dataset is associated with the following publication: Potapova, M., S. Lee, S. Spaulding, and N. Schulte. A harmonized dataset of sediment diatoms from hundreds of lakes in the northeastern United States. Scientific Data. Springer Nature, New York, NY, 9(540): 1-8, (2022).

  4. Data from: Integrated Approach to Global Land Use and Land Cover Reference...

    • zenodo.org
    bin, zip
    Updated Oct 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bernard Silva de Oliveira; Bernard Silva de Oliveira; Nathália Monteiro Teles; Vinícius Vieira Mesquita; Leandro Leal Parente; Laerte Guimarães Ferreira; Nathália Monteiro Teles; Vinícius Vieira Mesquita; Leandro Leal Parente; Laerte Guimarães Ferreira (2024). Integrated Approach to Global Land Use and Land Cover Reference Data Harmonization [Dataset]. http://doi.org/10.5281/zenodo.11285561
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Oct 18, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bernard Silva de Oliveira; Bernard Silva de Oliveira; Nathália Monteiro Teles; Vinícius Vieira Mesquita; Leandro Leal Parente; Laerte Guimarães Ferreira; Nathália Monteiro Teles; Vinícius Vieira Mesquita; Leandro Leal Parente; Laerte Guimarães Ferreira
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    INTRODUCTION

    This document outlines the creation of a global inventory of reference samples and Earth Observation (EO) / gridded datasets for the Global Pasture Watch (GPW) initiative. This inventory supports the training and validation of machine-learning models for GPW grassland mapping. This documentation outlines methodology, data sources, workflow, and results.

    Keywords: Grassland, Land Use, Land Cover, Gridded Datasets, Harmonization

    OBJECTIVES

    • Create a global inventory of existing reference samples for land use and land cover (LULC);

    • Compile global EO / gridded datasets that capture LULC classes and harmonize them to match the GPW classes;

    • Develop automated scripts for data harmonization and integration.

    DATA COLLECTION

    Datasets incorporated:

    Datasets

    Spatial distribution

    Time periodNumber of individual samples
    WorldCerealGlobal2016-202138,267,911
    Global Land Cover Mapping and Estimation (GLanCE)Global1985-202131,061,694
    EuroCropsEurope2015-202214,742,648
    GeoWiki G-GLOPS training datasetGlobal202111,394,623
    MapBiomas BrazilBrazil1985-20183,234,370
    Land Use/Land Cover
    Area Frame Survey (LUCAS)
    Europe2006-20181,351,293
    Dynamic WorldGlobal2019-20201,249,983
    Land Change Monitoring,
    Assessment, and Projection (LCMap)
    U.S. (CONUS)1984-2018874,836
    GeoWiki 2012Global2011-2012151,942
    PREDICTSGlobal1984-201316,627
    CropHarvestGlobal2018-20219,714

    Total: 102,355,642 samples

    WORKFLOW

    Harmonization Process

    We harmonized global reference samples and EO/gridded datasets to align with GPW classes, optimizing their integration into the GPW machine-learning workflow.

    We considered reference samples derived by visual interpretation with spatial support of at least 30 m (Landsat and Sentinel), that could represent LULC classes for a point or region.

    Each dataset was processed using automated Python scripts to download vector files and convert the original LULC classes into the following GPW classes:

    0. Other land cover

    1. Natural and Semi-natural grassland

    2. Cultivated grassland

    3. Crops and other related agricultural practices

    We empirically assigned a weight to each sample based on the original dataset's class description, reflecting the level of mixture within the class. The weights range from 1 (Low) to 3 (High), with higher weights indicating greater mixture. Samples with low mixture levels are more accurate and effective for differentiating typologies and for validation purposes.

    The harmonized dataset includes these columns:

    Attribute NameDefinition
    dataset_nameOriginal dataset name
    reference_yearReference year of samples from the original dataset
    original_lulc_classLULC class from the original dataset
    gpw_lulc_classGlobal Pasture Watch LULC class
    sample_weightSample's weight based on the mixture level within the original LULC class

    ACKNOWLEDGMENTS

    The development of this global inventory of reference samples and EO/gridded datasets relied on valuable contributions from various sources. We would like to express our sincere gratitude to the creators and maintainers of all datasets used in this project.

    REFERENCES

    • Brown, C.F., Brumby, S.P., Guzder-Williams, B. et al. Dynamic World, Near real-time global 10 m land use land cover mapping. Sci Data 9, 251 (2022). https://doi.org/10.1038/s41597-022-01307-4Van Tricht, K. et al. Worldcereal: a dynamic open-source system for global-scale, seasonal, and reproducible crop and irrigation mapping. Earth Syst. Sci. Data 15, 5491–5515, 10.5194/essd-15-5491-2023 (2023)

    • Buchhorn, M.; Smets, B.; Bertels, L.; De Roo, B.; Lesiv, M.; Tsendbazar, N.E., Linlin, L., Tarko, A. (2020): Copernicus Global Land Service: Land Cover 100m: Version 3 Globe 2015-2019: Product User Manual; Zenodo, Geneve, Switzerland, September 2020; doi: 10.5281/zenodo.3938963

    • d’Andrimont, R. et al. Harmonised lucas in-situ land cover and use database for field surveys from 2006 to 2018 in the european union. Sci. data 7, 352, 10.1038/s41597-019-0340-y (2020)

    • Fritz, S. et al. Geo-Wiki: An online platform for improving global land cover, Environmental Modelling & Software, 31, https://doi.org/10.1016/j.envsoft.2011.11.015 (2012)

    • Fritz, S., See, L., Perger, C. et al. A global dataset of crowdsourced land cover and land use reference data. Sci Data 4, 170075 https://doi.org/10.1038/sdata.2017.75 (2017)

    • Schneider, M., Schelte, T., Schmitz, F. & Körner, M. Eurocrops: The largest harmonized open crop dataset across the european union. Sci. Data 10, 612, 10.1038/s41597-023-02517-0 (2023)

    • Souza, C. M. et al. Reconstructing Three Decades of Land Use and Land Cover Changes in Brazilian Biomes with Landsat Archive and Earth Engine. Remote. Sens. 12, 2735, 10.3390/rs12172735 (2020)

    • Stanimirova, R. et al. A global land cover training dataset from 1984 to 2020. Sci. Data 10, 879 (2023)

    • Stehman, S. V., Pengra, B. W., Horton, J. A. & Wellington, D. F. Validation of the us geological survey’s land change monitoring, assessment and projection (lcmap) collection 1.0 annual land cover products 1985–2017. Remot Sensing environment 265, 112646, 10.1016/j.rse.2021.112646 (2021).
    • Tsendbazar, N. et al. Product validation report (d12-pvr) v 1.1 (2021).

    • Tseng, G., Zvonkov, I., Nakalembe, C. L., & Kerner, H. (2021). CropHarvest: A global dataset for crop-type classification. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  5. g

    COORDINATE Data Harmonisation Workshop 2

    • search.gesis.org
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bechert, Insa (2024). COORDINATE Data Harmonisation Workshop 2 [Dataset]. https://search.gesis.org/research_data/SDN-10.7802-2717
    Explore at:
    Dataset updated
    May 29, 2024
    Dataset provided by
    GESIS, Köln
    GESIS search
    Authors
    Bechert, Insa
    License

    https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms

    Description

    These data consist of five simulated datasets and a syntax file written in R. All files were created for use in the recorded COORDINATE Workshop 2 (https://www.youtube.com/watch?v=DeyBKxa894E). In this workshop, Scott Milligan, from the GESIS Leibniz Institute for the Social Sciences, leads participants through a complete data harmonisation exercise. The exercise examines the correlation between experiences with bullying and children’s happiness. Participants may run through the process parallel to the recorded workshop. More information on the project and the Harmonisation Toolbox developed in the project are available on the project’s webpage https://www.coordinate-network.eu/harmonisation or in COORDINATE Harmonisation Workshop 1 (https://www.youtube.com/watch?v=DeyBKxa894E).

  6. o

    Data from: HarDWR - Harmonized Water Rights Records

    • osti.gov
    Updated Oct 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caccese, Robert; Fisher-Vanden, Karen; Fowler, Lara; Grogan, Danielle; Lammers, Richard; Lisk, Matthew; Olmstead, Sheila; Peklak, Darrah; Zheng, Jiameng; Zuidema, Shan (2024). HarDWR - Harmonized Water Rights Records [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/2475306
    Explore at:
    Dataset updated
    Oct 31, 2024
    Dataset provided by
    USDOE Office of Science (SC), Biological and Environmental Research (BER)
    MultiSector Dynamics - Living, Intuitive, Value-adding, Environment
    Authors
    Caccese, Robert; Fisher-Vanden, Karen; Fowler, Lara; Grogan, Danielle; Lammers, Richard; Lisk, Matthew; Olmstead, Sheila; Peklak, Darrah; Zheng, Jiameng; Zuidema, Shan
    Description

    A dataset within the Harmonized Database of Western U.S. Water Rights (HarDWR). For a detailed description of the database, please see the meta-record v2.0. Changelog v2.0 - Recalculated based on data sourced from WestDAAT - Changed using a Site ID column to identify unique records to using aa combination of Site ID and Allocation ID - Removed the Water Management Area (WMA) column from the harmonized records. The replacement is a separate file which stores the relationship between allocations and WMAs. This allows for allocations to contribute to water right amounts to multiple WMAs during the subsequent cumulative process. - Added a column describing a water rights legal status - Added "Unspecified" was a water source category - Added an acre-foot (AF) column - Added a column for the classification of the right's owner v1.02 - Added a .RData file to the dataset as a convenience for anyone exploring our code. This is an internal file, and the one referenced in analysis scripts as the data objects are already in R data objects. v1.01 - Updated the names of each file with an ID number less than 3 digits to include leading 0s v1.0 - Initial public release Description Heremore » we present an updated database of Western U.S. water right records. This database provides consistent unique identifiers for each water right record, and a consistent categorization scheme that puts each water right record into one of seven broad use categories. These data were instrumental in conducting a study of the multi-sector dynamics of inter-sectoral water allocation changes though water markets (Grogan et al., in review). Specifically, the data were formatted for use as input to a process-based hydrologic model, Water Balance Model (WBM), with a water rights module (Grogan et al., in review). While this specific study motivated the development of the database presented here, water management in the U.S. West is a rich area of study (e.g., Anderson and Woosly, 2005; Tidwell, 2014; Null and Prudencio, 2016; Carney et al., 2021) so releasing this database publicly with documentation and usage notes will enable other researchers to do further work on water management in the U.S. West. We produced the water rights database presented here in four main steps: (1) data collection, (2) data quality control, (3) data harmonization, and (4) generation of cumulative water rights curves. Each of steps (1)-(3) had to be completed in order to produce (4), the final product that was used in the modeling exercise in Grogan et al. (in review). All data in each step is associated with a spatial unit called a Water Management Area (WMA), which is the unit of water right administration utilized by the state in which the right came from. Steps (2) and (3) required use to make assumptions and interpretation, and to remove records from the raw data collection. We describe each of these assumptions and interpretations below so that other researchers can choose to implement alternative assumptions an interpretation as fits their research aims. Motivation for Changing Data Sources The most significant change has been a switch from collecting the raw water rights directly from each state to using the water rights records presented in WestDAAT, a product of the Water Data Exchange (WaDE) Program under the Western States Water Council (WSWC). One of the main reasons for this is that each state of interest is a member of the WSWC, meaning that WaDE is partially funded by these states, as well as many universities. As WestDAAT is also a database with consistent categorization, it has allowed us to spend less time on data collection and quality control and more time on answering research questions. This has included records from water right sources we had previously not known about when creating v1.0 of this database. The only major downside to utilizing the WestDAAT records as our raw data is that further updates are tied to when WestDAAT is updated, as some states update their public water right records daily. However, as our focus is on cumulative water amounts at the regional scale, it is unlikely most records updates would have a significant effect on our results. The structure of WestDAAT led to several important changes to how HarWR is formatted. The most significant change is that WaDE has calculated a field known as SiteUUID, which is a unique identifier for the Point of Diversion (POD), or where the water is drawn from. This separate from AllocationNativeID, which is the identifier for the allocation of water, or the amount of water associated with the water right. It should be noted that it is possible for a single site to have multiple allocations associated with it and for an allocation to be able to be extracted from multiple sites. The site-allocation structure has allowed us to adapt a more consistent, and hopefully more realistic, approach in organizing the water right records than we had with HarDWR v1.0. This was incredibly helpful as the raw data from many states had multiple water uses within a single field within a single row of their raw data, and it was not always clear if the first water use was the most important, or simply first alphabetically. WestDAAT has already addressed this data quality issue. Furthermore, with v1.0, when there were multiple records with the same water right ID, we selected the largest volume or flow amount and disregarded the rest. As WestDAAT was already a common structure for disparate data formats, we were better able to identify sites with multiple allocations and, perhaps more importantly, allocations with multiple sites. This is particularly helpful when an allocation has sites which cross WMA boundaries, instead of just assigning the full water amount to a single WMA we are now able to divide the amount of water between the number of relevant WMAs. As it is now possible to identify allocations with water used in multiple WMAs, it is no longer practical to store this information within a single column. Instead the stAllocationToWMATab.csv file was created, which is an allocation by WMA matrix containing the percent Place of Use area overlap with each WMA. We then use this percentage to divide the allocation's flow amount between the given WMAs during the cumulation process to hopefully provide more realistic totals of water use in each area. However, not every state provides areas of water use, so like HarDWR v1.0, a hierarchical decision tree was used to assign each allocation to a WMA. First, if a WMA could be identified based on the allocation ID, then that WMA was used; typically, when available, this applied to the entire state and no further steps were needed. Second was the spatial analysis of Place of Use to WMAs. Third was a spatial analysis of the POD locations to WMAs, with the assumption that allocation's POD is within the WMA it should belong to; if an allocation still had multiple WMAs based on its POD locations, then the allocation's flow amount would be divided equally between all WMAs. The fourth, and final, process was to include water allocations which spatially fell outside of the state WMA boundaries. This could be due to several reasons, such as coordinate errors / imprecision in the POD location, imprecision in the WMA boundaries, or rights attached with features, such as a reservoir, which crosses state boundaries. To include these records, we decided for any POD which was within one kilometer of the state's edge would be assigned to the nearest WMA. Other Changes WestDAAT has Allowed In addition to a more nuanced and consistent method of assigning water right's data to WMAs, there are other benefits gained from using the WestDAAT dataset. Among those is a consistent categorization of a water right's legal status. In HarDWR v1.0, legal status was effectively ignored, which led to many valid concerns about the quality of the database related to the amounts of water the rights allowed to be claimed. The main issue was that rights with legal status' such as "application withdrawn", "non-active", or "cancelled" were included within HarDWR v1.0. These, and other water rights status' which were deemed to not be in use have been removed from this version of the database. Another major change has been the addition of the "unspecified water source category. This is water that can come from either surface water or groundwater, or the source of which is unknown. The addition of this source category brings the total number of categories to three. Due to reviewer feedback, we decided to add the acre-foot (AF) column so that the data may be more applicable to a wider audience. We added the ownerClassification column so that the data may be more applicable to a wider audience. File Descriptions The dataset is a series of various files organized by state sub-directories. In addition, each file begins with the state's name, in case the file is separate from its sub-directory for some reason. After the state name is the text which describes the contents of the file. Here is each file described in detail. Note that st is a placeholder for the state's name. stFullRecords_HarmonizedRights.csv: A file of the complete water records for each state. The column headers for each of this type of file are: state - The name of the state to which the allocations belong to. FIPS - The two digit numeric state ID code. siteID - The site location ID for POD locations. A site may have multiple allocations, which are the actual amount of water which can be drawn. In a simplified hypothetical, a farm stead may have an allocation for "irrigation" and an allocation for "domestic" water use, but the water is drawn from the same pumping equipment. It should be noted that many of the site ID appear to have been added by WaDE, and therefore may not be recognized by a given state's water rights database. allocationID - The allocation ID for the water right. For most states this is the water right ID, and what is

  7. f

    Description and harmonization strategy for the predictor variables.

    • figshare.com
    xlsx
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan (2025). Description and harmonization strategy for the predictor variables. [Dataset]. http://doi.org/10.1371/journal.pone.0309572.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description and harmonization strategy for the predictor variables.

  8. f

    Predictor variables used in analysis and the methods used to harmonize to...

    • plos.figshare.com
    xls
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan (2025). Predictor variables used in analysis and the methods used to harmonize to the categorical variables. [Dataset]. http://doi.org/10.1371/journal.pone.0309572.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Predictor variables used in analysis and the methods used to harmonize to the categorical variables.

  9. i

    Household Expenditure and Income Survey 2008, Economic Research Forum (ERF)...

    • catalog.ihsn.org
    Updated Jan 12, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Statistics (2022). Household Expenditure and Income Survey 2008, Economic Research Forum (ERF) Harmonization Data - Jordan [Dataset]. https://catalog.ihsn.org/index.php/catalog/7661
    Explore at:
    Dataset updated
    Jan 12, 2022
    Dataset authored and provided by
    Department of Statistics
    Time period covered
    2008 - 2009
    Area covered
    Jordan
    Description

    Abstract

    The main objective of the HEIS survey is to obtain detailed data on household expenditure and income, linked to various demographic and socio-economic variables, to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. Therefore, to achieve these goals, the sample had to be representative on the sub-district level. The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality.

    Data collected through the survey helped in achieving the following objectives: 1. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index 2. Study the consumer expenditure pattern prevailing in the society and the impact of demograohic and socio-economic variables on those patterns 3. Calculate the average annual income of the household and the individual, and assess the relationship between income and different economic and social factors, such as profession and educational level of the head of the household and other indicators 4. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it 5. Provide the necessary data for the national accounts related to overall consumption and income of the household sector 6. Provide the necessary income data to serve in calculating poverty indices and identifying the poor chracteristics as well as drawing poverty maps 7. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty

    Geographic coverage

    National

    Analysis unit

    • Household/families
    • Individuals

    Universe

    The survey covered a national sample of households and all individuals permanently residing in surveyed households.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The 2008 Household Expenditure and Income Survey sample was designed using two-stage cluster stratified sampling method. In the first stage, the primary sampling units (PSUs), the blocks, were drawn using probability proportionate to the size, through considering the number of households in each block to be the block size. The second stage included drawing the household sample (8 households from each PSU) using the systematic sampling method. Fourth substitute households from each PSU were drawn, using the systematic sampling method, to be used on the first visit to the block in case that any of the main sample households was not visited for any reason.

    To estimate the sample size, the coefficient of variation and design effect in each subdistrict were calculated for the expenditure variable from data of the 2006 Household Expenditure and Income Survey. This results was used to estimate the sample size at sub-district level, provided that the coefficient of variation of the expenditure variable at the sub-district level did not exceed 10%, with a minimum number of clusters that should not be less than 6 at the district level, that is to ensure good clusters representation in the administrative areas to enable drawing poverty pockets.

    It is worth mentioning that the expected non-response in addition to areas where poor families are concentrated in the major cities were taken into consideration in designing the sample. Therefore, a larger sample size was taken from these areas compared to other ones, in order to help in reaching the poverty pockets and covering them.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    List of survey questionnaires: (1) General Form (2) Expenditure on food commodities Form (3) Expenditure on non-food commodities Form

    Cleaning operations

    Raw Data The design and implementation of this survey procedures were: 1. Sample design and selection 2. Design of forms/questionnaires, guidelines to assist in filling out the questionnaires, and preparing instruction manuals 3. Design the tables template to be used for the dissemination of the survey results 4. Preparation of the fieldwork phase including printing forms/questionnaires, instruction manuals, data collection instructions, data checking instructions and codebooks 5. Selection and training of survey staff to collect data and run required data checkings 6. Preparation and implementation of the pretest phase for the survey designed to test and develop forms/questionnaires, instructions and software programs required for data processing and production of survey results 7. Data collection 8. Data checking and coding 9. Data entry 10. Data cleaning using data validation programs 11. Data accuracy and consistency checks 12. Data tabulation and preliminary results 13. Preparation of the final report and dissemination of final results

    Harmonized Data - The Statistical Package for Social Science (SPSS) was used to clean and harmonize the datasets - The harmonization process started with cleaning all raw data files received from the Statistical Office - Cleaned data files were then all merged to produce one data file on the individual level containing all variables subject to harmonization - A country-specific program was generated for each dataset to generate/compute/recode/rename/format/label harmonized variables - A post-harmonization cleaning process was run on the data - Harmonized data was saved on the household as well as the individual level, in SPSS and converted to STATA format

  10. Dataset of "A Metabolites Merging Strategy (MMS): Harmonization to enable...

    • data.europa.eu
    • zenodo.org
    unknown
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). Dataset of "A Metabolites Merging Strategy (MMS): Harmonization to enable studies intercomparison" [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-8226097?locale=bg
    Explore at:
    unknown(157557)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    Description

    Metabolomics encounters challenges in cross-study comparisons due to diverse metabolite nomenclature and reporting practices. To bridge this gap, we introduce the Metabolites Merging Strategy (MMS), offering a systematic framework to harmonize multiple metabolite datasets for enhanced interstudy comparability. MMS has three steps. Step 1: Translation and merging of the different datasets by employing InChIKeys for data integration, encompassing the translation of metabolite names (if needed). Followed by Step 2: Attributes' retrieval from the InChIkey, including descriptors of name (title name from PubChem and RefMet name from Metabolomics Workbench), and chemical properties (molecular weight and molecular formula), both systematic (InChI, InChIKey, SMILES) and non-systematic identifiers (PubChem, CheBI, HMDB, KEGG, LipidMaps, DrugBank, Bin ID and CAS number), and their ontology. Finally, a meticulous three-step curation process is used to rectify disparities for conjugated base/acid compounds (optional step), missing attributes, and synonym checking (duplicated information). The MMS procedure is exemplified through a case study of urinary asthma metabolites, where MMS facilitated the identification of significant pathways hidden when no dataset merging strategy was followed. This study highlights the need for standardized and unified metabolite datasets to enhance the reproducibility and comparability of metabolomics studies.

  11. Meta-analysis sample size of harmonized variables for each study.

    • plos.figshare.com
    xlsx
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan (2025). Meta-analysis sample size of harmonized variables for each study. [Dataset]. http://doi.org/10.1371/journal.pone.0309572.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Meta-analysis sample size of harmonized variables for each study.

  12. t

    Data from: Harmonizing oer metadata in etl processes with skohub in the...

    • service.tib.eu
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Harmonizing oer metadata in etl processes with skohub in the project “wirlernenonline” [Dataset]. https://service.tib.eu/ldmservice/dataset/goe-doi-10-25625-8mzswb
    Explore at:
    Dataset updated
    May 16, 2025
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The metadata for Open Educational Resources (OER) are often made available in repositories without recourse to uniform value lists and corresponding standards for their attributes. This circumstance complicates data harmonization when OERs from different sources are to be merged in one search environment. With the help of the RDF standard SKOS and the tool SkoHub-Vocabs, the project "WirLernenOnline" has found an innovative, reusable and standards-based solution to this challenge. This involves the creation of SKOS vocabularies that are used during the ETL process to standardize different terms (for example, "math" and "mathematics"). This then forms the basis for providing users with consistent filtering options and a good search experience. The created and open licensed vocabularies can then easily be reused and linked to overcome this challenge in the future.

  13. c

    ckanext-harmonisation

    • catalog.civicdataecosystem.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ckanext-harmonisation [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-harmonisation
    Explore at:
    Dataset updated
    Jun 4, 2025
    Description

    The Harmonisation extension for CKAN is designed to standardize metadata labels and values, especially those adhering to the ODM (Open Data Monitor) metadata scheme. It facilitates the harmonization of specific metadata fields through a web interface, allowing users to manage and refine the consistency of their datasets. This extension is used in conjunction with MongoDB to store raw and harmonized metadata, and is part of the broader ODM project aiming to improve data quality within the CKAN ecosystem. Key Features: Metadata Harmonization via Web Form: Provides a user interface for harmonizing specific metadata fields like Dates, Resources, Licenses, and Categories, thus streamlining the data cleaning process for end users. Mapping Management: Allows administrators and users to add new mappings or update existing ones, enabling customization and continuous improvement of the harmonization rules. MongoDB Integration: Leverages MongoDB to store both raw and harmonized metadata by connecting to specific collections ('odm' and 'odm_harmonised'), ensuring data persistence and ready access. Scheduled Harmonization Jobs: Supports automated harmonization tasks through the harmonisation_slave.py script. Users can set up cron jobs to run the script periodically, minimizing manual intervention and ensuring data consistency over time. ODM Metadata Scheme Compliance: Specifically designed to work with metadata that complies with the ODM metadata scheme, thereby improving interoperability and adherence to standards. Technical Integration: The Harmonisation extension requires updates to the CKAN configuration file (development.ini) to activate the plugin and set up the necessary ODM extension settings. The extension also requires MongoDB to be installed and configured as the metadata repository. After correctly configured, users can schedule automatic harmonisation jobs by executing the harmonisation_slave.py script as a cron job. Benefits & Impact: By implementing the Harmonisation extension, organizations can significantly improve the quality and consistency of their metadata. By allowing the harmonisation of key data fields, it enables the data to become more reliable and readily integratable with other systems. This automation streamlines metadata management, reducing manual effort and ensuring that data consistently adheres to the configured standards.

  14. f

    Eligible studies from the CureSCi Metadata Catalog and their available...

    • plos.figshare.com
    xls
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan (2025). Eligible studies from the CureSCi Metadata Catalog and their available predictor variables. [Dataset]. http://doi.org/10.1371/journal.pone.0309572.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Eligible studies from the CureSCi Metadata Catalog and their available predictor variables.

  15. e

    Employment and Unemployment Survey, EUS 2005 - Jordan

    • erfdataportal.com
    Updated Jun 9, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Statistics (2019). Employment and Unemployment Survey, EUS 2005 - Jordan [Dataset]. https://www.erfdataportal.com/index.php/catalog/150
    Explore at:
    Dataset updated
    Jun 9, 2019
    Dataset provided by
    Economic Research Forum
    Department of Statistics
    Time period covered
    2005 - 2006
    Area covered
    Jordan
    Description

    Abstract

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN

    The Department of Statistics (DOS) carried out four rounds of the 2005 Employment and Unemployment Survey (EUS) during February, May, August and November 2005. The survey rounds covered a total sample of about thirty nine households Nation-wide. The sampled households were selected using a stratified multi-stage cluster sampling design. It is noteworthy that the sample represents the national level (Kingdom), governorates, the three Regions (Central, North and South), and the urban/rural areas.

    The importance of this survey lies in that it provides a comprehensive data base on employment and unemployment that serves decision makers, researchers as well as other parties concerned with policies related to the organization of the Jordanian labor market.

    The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.

    Geographic coverage

    Covering a sample representative on the national level (Kingdom), governorates, the three Regions (Central, North and South), and the urban/rural areas.

    Analysis unit

    1- Household/family. 2- Individual/person.

    Universe

    The survey covered a national sample of households and all individuals permanently residing in surveyed households.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The questionnaire is divided into main topics, each containing a clear and consistent group of questions, and designed in a way that facilitates the electronic data entry and verification. The questionnaire includes the characteristics of household members in addition to the identification information, which reflects the administrative as well as the statistical divisions of the Kingdom.

    Cleaning operations

    Raw Data

    The plan of the tabulation of survey results was guided by former Employment and Unemployment Surveys which were previously prepared and tested. The final survey report was then prepared to include all detailed tabulations as well as the methodology of the survey.

    Harmonized Data

    • The SPSS package is used to clean and harmonize the datasets.
    • The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency.
    • All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization.
    • A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables.
    • A post-harmonization cleaning process is then conducted on the data.
    • Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.
  16. o

    Evaluation of item matching strategies to harmonize assessment tools for...

    • osf.io
    Updated Mar 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mauricio Scopel Hoffmann; Tyler Moore; Michael Milham; Theodore Sattherwaite; Giovanni Salum (2022). Evaluation of item matching strategies to harmonize assessment tools for psychopathology in children and adolescents [Dataset]. http://doi.org/10.17605/OSF.IO/WNRP4
    Explore at:
    Dataset updated
    Mar 29, 2022
    Dataset provided by
    Center For Open Science
    Authors
    Mauricio Scopel Hoffmann; Tyler Moore; Michael Milham; Theodore Sattherwaite; Giovanni Salum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Reproducible Brain Charts initiative aims to aggregate and harmonize phenotypic and neuroimage data to delineate novel mechanisms regarding the developmental basis of psychopathology in youth and yield reproducible growth charts of brain development. To reach this objective, the second step of our project is to test item-wise matching strategies of phenotypic harmonization between studies using bifactor models of psychopathology. We focused on this model because general and specific aspects of mental health problems can dissociated, so more specific relationships with the brain could be established. In the current study, we benchmarked six item matching strategies for harmonizing the Child Behavioral Checklist (CBCL) and the Sstrenghts and Difficulties Qquestionnaire (SDQ) within a bifactor model framework in two samples that were assessed with both instruments. It proceded in the following steps: 1) harmonization of items according to the six strategies, 2) estimated bifactor models with harmonized items for each sample separately, 3) estimated factor score correlation between assessment tools in each sample, 4) estimated factor reliability, 5) tested the assessment’s invariance according to each strategy and 6) calculated the root expected mean square difference (REMSD) to estimate the factor score difference of using a proxy measure instead of a target measure while integrating the two samples. We expect that the results of this study can encourage the use of the best streategy to date to increase reproducibility in the field while aggregating data from different contexts and instruments in the context of the bifactor model of psychopathology.

  17. f

    A univariate analysis where hydroxyurea use was modeled as a function of...

    • plos.figshare.com
    xlsx
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan (2025). A univariate analysis where hydroxyurea use was modeled as a function of each individual predictor. [Dataset]. http://doi.org/10.1371/journal.pone.0309572.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A univariate analysis where hydroxyurea use was modeled as a function of each individual predictor.

  18. e

    The Longitudinal IntermediaPlus Data Source (2014-2016) - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Apr 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). The Longitudinal IntermediaPlus Data Source (2014-2016) - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/fddd6e01-6d8c-5e7c-b106-70c3b907af43
    Explore at:
    Dataset updated
    Apr 27, 2023
    Description

    The media analysis data was collected for commercial purposes. They are used in media planning as well as in the advertising planning of the different media genres (radio, press media, TV, poster and since 2010 also online). They are cross-sections that are merged together for one year. ag.ma kindly provides the data for scientific use on an annual basis – with a two-year notice period – to GESIS. In addition, agof has provided documentation regarding data collection (questionnaires, code plans, etc.) for the preparation of the MA IntermediaPlus online bundle. In order to make the data accessible for scientific use, the datasets of the individual years were harmonized and pooled into a longitudinal data set starting in 2014 as part of the dissertation project ´Audience and Market Fragmentation online´ of the Digital Society research program NRW at the Heinrich-Heine-University (HHU) and the University of Applied Sciences Düsseldorf (HSD), funded by the Ministry of Culture and Science of the German State of North Rhine-Westphalia. The prepared Longitudinal IntermediaPlus dataset 2014 to 2016 is a ´big data´, which is why the entire dataset will only be available in the form of a database (MySQL). In this database, the information of different variables of a respondent is organized in one column, one row per variable. The present data documentation shows the total database for online media use of the years 2014 to 2016. The data contains all variables of socio demography, free-time activities, additional information on a respondent and his household as well as the interview-specific variables and weights. Only the variables concerning the respondent´s media use are a selection: The online media use of all full online as well as their single entities for all genres whose business model is the provision of content is included - e-commerce, games, etc. were excluded. The media use of radio, print and TV is not included. Preparation for further years is possible, as is the preparation of cross-media media use for radio, press media and TV. Harmonization is available for radio and press media up to 2015 waiting to be applied. The digital process chain developed for data preparation and harmonization is published at GESIS and available for further projects updating the time series for further years. Recourse to these documents - Excel files, scripts, harmonization plans, etc. - is strongly recommended. The process and harmonization for the Longitudinal IntermediaPlus for 2014 to 2016 database was made available in accordance with the FAIR principles (Wilkinson et al. 2016). By harmonizing and pooling the cross-sectional datasets to one longitudinal dataset – which is being carried out by Inga Brentel and Céline Fabienne Kampes as part of the dissertation project ´Audience and Market Fragmentation online´ –, the aim is to make the data source of the media analysis, accessible for research on social and media change in Germany. Die Media-Analyse Daten wurden zu kommerziellen Zwecken erhoben. Sie werden in der Mediaplanung sowie der Werbeplanung der unterschiedlichen Mediengattungen (Radio, Pressemedien, TV, Plakat und seit 2010 auch Online) eingesetzt. Es handelt sich um Querschnitte, die für ein Jahr aneinandergereiht werden. Die ag.ma stellt freundlicherweise jährlich – mit einer Frist von zwei Jahren – die entsprechenden Daten der GESIS zur wissenschaftlichen Nutzung bereit. Zusätzlich hat die agof für die Aufbereitung der Online-Tranche der MA IntermediaPlus Unterlagen bezüglich der Datenerhebung (Fragebögen, Codepläne, usw.) bereitgestellt. Um die Daten für die wissenschaftliche Nutzung zugänglich zu machen, wurden ab 2018 im Rahmen des Dissertationsprojektes „Angebots- und Publikumsfragmentierung online“ des Graduiertenkollegs Digitale Gesellschaft NRW an der Heinrich-Heine-Universität (HHU) sowie der Hochschule Düsseldorf (HSD) gefördert durch das Ministerium für Kultur und Wissenschaft des Landes Nordrhein-Westfalen die Datensätze der einzelnen Jahre zu einem Längsschnitt-Datensatz ab 2014 harmonisiert. Bei dem aufbereiteten Längsschnitt-Datensatz 2014 bis 2016 handelt es sich um „Big-Data“, weshalb der Gesamtdatensatz nur in Form einer Datenbank (MySQL) verfügbar ist. In dieser Datenbank liegt die Information verschiedener Variablen eines Befragten untereinander. Die vorliegende Datendokumentation zeigt den Gesamtdatensatz für die Jahre 2014 bis 2016 für die Online-Mediennutzung. Folgende Variablengruppen wurden neben der Soziodemografie im Rahmen der vorliegenden Studie erhoben bzw. für den Längsschnittdatensatz harmonisiert: Freizeitverhalten, Zusatzinformation zum Befragten und dessen Haushalt wie Geräte im Haushalt, Online-Mediennutzung Content sowie interviewspezifische Variablen und Gewichte. Lediglich bei den Variablen bezüglich der Mediennutzung des Befragten, handelt es sich um eine Auswahl: es ist ausschließlich die Onlinemediennutzung aller Gesamtangebote sowie der Einzelangebote aller Genre, deren Geschäftsmodell auf der Bereitstellung von Inhalten (Content) basiert, aufgenommen – E-Commerce, Spiele, etc. wurden ausgeschlossen. Die Mediennutzung von Radio, Print und TV wurde nicht berücksichtigt. Eine Aufbereitung für weitere Jahre nach 2017 ist grundsätzlich möglich, ebenso die Aufbereitung crossmedialer Mediennutzung für Radio, Pressemedien und TV. Unterlagen zur Harmonisierung liegen für Radio und Pressemedien bis 2015 vor. Die erarbeitete digitale Prozesskette zur Datenaufbereitung und -harmonisierung ist bei GESIS publiziert und für weitere Aufbereitungsschritte verfügbar. Der Rückgriff auf diese Unterlagen – Excel-Dateien, Skripte, Harmonisierungspläne, usw. – wird ausdrücklich empfohlen. Die Aufbereitung und Harmonisierung des Längsschnitts des Gesamtdatensatzes der MA IntermediaPlus für 2014 bis 2016 erfolgte unter Berücksichtigung der FAIR-Prinzipien (Wilkinson et al. 2016). Ziel ist es durch die Harmonisierung der einzelnen Querschnitte die Datenquelle der Media-Analyse, die im Rahmen des Dissertationsprojektes „Angebots- und Publikumsfragmentierung online“ durch Inga Brentel und Céline Fabienne Kampes erfolgte, für Forschung zum sozialen und medialen Wandel in der Bundesrepublik Deutschland zugänglich zu machen.

  19. Z

    WorldCereal open global harmonized reference data repository (CC-BY-SA...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristof Van Tricht (2024). WorldCereal open global harmonized reference data repository (CC-BY-SA licensed data sets) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7609545
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Juan Carlos Laso Bayas
    Arun Pratihast
    Santosh Karanam
    Steffen Fritz
    Hendrik Boogaard
    Jeroen Degerickx
    Sven Gilliams
    Kristof Van Tricht
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Within the ESA funded WorldCereal project we have built an open harmonized reference data repository at global extent for model training or product validation in support of land cover and crop type mapping. Data from 2017 onwards were collected from many different sources and then harmonized, annotated and evaluated. These steps are explained in the harmonization protocol (10.5281/zenodo.7584463). This protocol also clarifies the naming convention of the shape files and the WorldCereal attributes (LC, CT, IRR, valtime and sampleID) that were added to the original data sets.

    This publication includes those harmonized data sets of which the original data set was published under the CC-BY-SA license or a license similar to CC-BY-SA. See document "_In-situ-data-World-Cereal - license - CC-BY-SA.pdf" for an overview of the original data sets.

  20. Supplementary material for Lee et al. 2019 Taxonomic harmonization may...

    • catalog.data.gov
    • data.amerigeoss.org
    Updated Nov 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Supplementary material for Lee et al. 2019 Taxonomic harmonization may reveal a stronger association between diatom assemblages and total phosphorus in large datasets [Dataset]. https://catalog.data.gov/dataset/supplementary-material-for-lee-et-al-2019-taxonomic-harmonization-may-reveal-a-stronger-as
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Diatom data have been collected in large-scale biological assessments in the United States, such as the U.S. Environmental Protection Agency’s National Rivers and Streams Assessment (NRSA). However, the effectiveness of diatoms as indicators may suffer if inconsistent taxon identifications across different analysts obscure the relationships between assemblage composition and environmental variables. To reduce these inconsistencies, we harmonized the 2008–2009 NRSA data from nine analysts by updating names to current synonyms and by statistically identifying taxa with high analyst signal (taxa with more variation in relative abundance explained by the analyst factor, relative to environmental variables). We then screened a subset of samples with QA/QC data and combined taxa with mismatching identifications by the primary and secondary analysts. When these combined “slash groups” did not reduce analyst signal, we elevated taxa to the genus level or omitted taxa in difficult species complexes. We examined the variation explained by analyst in the original and revised datasets. Further, we examined how revising the datasets to reduce analyst signal can reduce inconsistency, thereby uncovering the variation in assemblage composition explained by total phosphorus (TP), an environmental variable of high priority for water managers. To produce a revised dataset with the greatest taxonomic consistency, we ultimately made 124 slash groups, omitted 7 taxa in the small naviculoid (e.g., Sellaphora atomoides) species complex, and elevated Nitzschia, Diploneis, and Tryblionella taxa to the genus level. Relative to the original dataset, the revised dataset had more overlap among samples grouped by analyst in ordination space, less variation explained by the analyst factor, and more than double the variation in assemblage composition explained by TP. Elevating all taxa to the genus level did not eliminate analyst signal completely, and analyst remained the most important predictor for the genera Sellaphora, Mayamaea, and Psammodictyon, indicating that these taxa present the greatest obstacle to consistent identification in this dataset. Although our process did not completely remove analyst signal, this work provides a method to minimize analyst signal and improve detection of diatom association with TP in large datasets involving multiple analysts. Examination of variation in assemblage data explained by analyst and taxonomic harmonization may be necessary steps for improving data quality and the utility of diatoms as indicators of environmental variables. This dataset is associated with the following publication: Lee, S., I. Bishop, S. Spaulding, R. Mitchell, and L. Yuan. Taxonomic harmonization may reveal a stronger association between diatom assemblages and total phosphorus in large datasets.. ECOLOGICAL INDICATORS. Elsevier Science Ltd, New York, NY, USA, 102: 166-174, (2019).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Zoch, Michele; Peng, Yuan; Reinecke, Ines; Henke, Elisa; Sedlmayr, Martin; Bathelt, Franziska (2024). Additional file 1 of Conceptual design of a generic data harmonization process for OMOP common data model [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001502363

Additional file 1 of Conceptual design of a generic data harmonization process for OMOP common data model

Explore at:
Dataset updated
Feb 27, 2024
Authors
Zoch, Michele; Peng, Yuan; Reinecke, Ines; Henke, Elisa; Sedlmayr, Martin; Bathelt, Franziska
Description

A detailed overview of the results of the literature search, including the data extraction matrix can be found in the Additional file 1.

Search
Clear search
Close search
Google apps
Main menu