100+ datasets found
  1. V

    ACF NIEM Human Services Domain Data Harmonization Process

    • data.virginia.gov
    • catalog.data.gov
    html
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). ACF NIEM Human Services Domain Data Harmonization Process [Dataset]. https://data.virginia.gov/dataset/acf-niem-human-services-domain-data-harmonization-process
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    ACF Agency Wide resource

    Metadata-only record linking to the original dataset. Open original dataset below.

  2. e

    ComBat HarmonizR enables the integrated analysis of independently generated...

    • ebi.ac.uk
    Updated May 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hannah Voß (2022). ComBat HarmonizR enables the integrated analysis of independently generated proteomic datasets through data harmonization with appropriate handling of missing values [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD027467
    Explore at:
    Dataset updated
    May 23, 2022
    Authors
    Hannah Voß
    Variables measured
    Proteomics
    Description

    The integration of proteomic datasets, generated by non-cooperating laboratories using different LC-MS/MS setups can overcome limitations in statistically underpowered sample cohorts but has not been demonstrated to this day. In proteomics, differences in sample preservation and preparation strategies, chromatography and mass spectrometry approaches and the used quantification strategy distort protein abundance distributions in integrated datasets. The Removal of these technical batch effects requires setup-specific normalization and strategies that can deal with missing at random (MAR) and missing not at random (MNAR) type values at a time. Algorithms for batch effect removal, such as the ComBat-algorithm, commonly used for other omics types, disregard proteins with MNAR missing values and reduce the informational yield and the effect size for combined datasets significantly. Here, we present a strategy for data harmonization across different tissue preservation techniques, LC-MS/MS instrumentation setups and quantification approaches. To enable batch effect removal without the need for data reduction or error-prone imputation we developed an extension to the ComBat algorithm, ´ComBat HarmonizR, that performs data harmonization with appropriate handling of MAR and MNAR missing values by matrix dissection The ComBat HarmonizR based strategy enables the combined analysis of independently generated proteomic datasets for the first time. Furthermore, we found ComBat HarmonizR to be superior for removing batch effects between different Tandem Mass Tag (TMT)-plexes, compared to commonly used internal reference scaling (iRS). Due to the matrix dissection approach without the need of data imputation, the HarmonizR algorithm can be applied to any type of -omics data while assuring minimal data loss

  3. f

    Data on the harmonization of image velocimetry techniques, from seven...

    • datasetcatalog.nlm.nih.gov
    Updated Mar 23, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jamieson, E.; Mayr, P.; Tauro, Flavia; Hauet, A.; Perks, Matt; Sinclair, L.; Pearce, S.; Dal Sasso, S. F.; Bomhof, J.; Maddock, I.; Hortobágyi, B.; Grimaldi, S.; Jodeau, M.; Pénard, L.; Peña-Haro, S.; Ljubičić, R.; Manfreda, S.; Käfer, S.; Detert, M.; Paulus, G.; Pizarro, A.; Vogel, U.; Strelnikova, Dariia; Goulet, A.; Le Coz, J. (2020). Data on the harmonization of image velocimetry techniques, from seven different countries [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000505124
    Explore at:
    Dataset updated
    Mar 23, 2020
    Authors
    Jamieson, E.; Mayr, P.; Tauro, Flavia; Hauet, A.; Perks, Matt; Sinclair, L.; Pearce, S.; Dal Sasso, S. F.; Bomhof, J.; Maddock, I.; Hortobágyi, B.; Grimaldi, S.; Jodeau, M.; Pénard, L.; Peña-Haro, S.; Ljubičić, R.; Manfreda, S.; Käfer, S.; Detert, M.; Paulus, G.; Pizarro, A.; Vogel, U.; Strelnikova, Dariia; Goulet, A.; Le Coz, J.
    Description

    Here, we present a range of datasets that have been compiled from across seven countries in order to facilitate image velocimetry inter-comparison studies. These data have been independently produced for the primarily purposes of: (i) enhancing our understanding of open-channel flows in diverse flow regimes; and (ii) testing specific image velocimetry techniques. These datasets have been acquired across a range of hydro-geomorphic settings, using a diverse range of cameras, encoding software, controller units, and with river velocity measurements generated as a result of differing image pre-processing and image processing software.

  4. f

    Additional file 1 of Conceptual design of a generic data harmonization...

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Feb 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zoch, Michele; Peng, Yuan; Reinecke, Ines; Henke, Elisa; Sedlmayr, Martin; Bathelt, Franziska (2024). Additional file 1 of Conceptual design of a generic data harmonization process for OMOP common data model [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001502363
    Explore at:
    Dataset updated
    Feb 27, 2024
    Authors
    Zoch, Michele; Peng, Yuan; Reinecke, Ines; Henke, Elisa; Sedlmayr, Martin; Bathelt, Franziska
    Description

    A detailed overview of the results of the literature search, including the data extraction matrix can be found in the Additional file 1.

  5. f

    Predictor variables used in analysis and the methods used to harmonize to...

    • plos.figshare.com
    xls
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan (2025). Predictor variables used in analysis and the methods used to harmonize to the categorical variables. [Dataset]. http://doi.org/10.1371/journal.pone.0309572.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Predictor variables used in analysis and the methods used to harmonize to the categorical variables.

  6. Harmonization of resting-state functional MRI data across multiple imaging...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayumu Yamashita; Noriaki Yahata; Takashi Itahashi; Giuseppe Lisi; Takashi Yamada; Naho Ichikawa; Masahiro Takamura; Yujiro Yoshihara; Akira Kunimatsu; Naohiro Okada; Hirotaka Yamagata; Koji Matsuo; Ryuichiro Hashimoto; Go Okada; Yuki Sakai; Jun Morimoto; Jin Narumoto; Yasuhiro Shimada; Kiyoto Kasai; Nobumasa Kato; Hidehiko Takahashi; Yasumasa Okamoto; Saori C. Tanaka; Mitsuo Kawato; Okito Yamashita; Hiroshi Imamizu (2023). Harmonization of resting-state functional MRI data across multiple imaging sites via the separation of site differences into sampling bias and measurement bias [Dataset]. http://doi.org/10.1371/journal.pbio.3000042
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ayumu Yamashita; Noriaki Yahata; Takashi Itahashi; Giuseppe Lisi; Takashi Yamada; Naho Ichikawa; Masahiro Takamura; Yujiro Yoshihara; Akira Kunimatsu; Naohiro Okada; Hirotaka Yamagata; Koji Matsuo; Ryuichiro Hashimoto; Go Okada; Yuki Sakai; Jun Morimoto; Jin Narumoto; Yasuhiro Shimada; Kiyoto Kasai; Nobumasa Kato; Hidehiko Takahashi; Yasumasa Okamoto; Saori C. Tanaka; Mitsuo Kawato; Okito Yamashita; Hiroshi Imamizu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    When collecting large amounts of neuroimaging data associated with psychiatric disorders, images must be acquired from multiple sites because of the limited capacity of a single site. However, site differences represent a barrier when acquiring multisite neuroimaging data. We utilized a traveling-subject dataset in conjunction with a multisite, multidisorder dataset to demonstrate that site differences are composed of biological sampling bias and engineering measurement bias. The effects on resting-state functional MRI connectivity based on pairwise correlations because of both bias types were greater than or equal to psychiatric disorder differences. Furthermore, our findings indicated that each site can sample only from a subpopulation of participants. This result suggests that it is essential to collect large amounts of neuroimaging data from as many sites as possible to appropriately estimate the distribution of the grand population. Finally, we developed a novel harmonization method that removed only the measurement bias by using a traveling-subject dataset and achieved the reduction of the measurement bias by 29% and improvement of the signal-to-noise ratios by 40%. Our results provide fundamental knowledge regarding site effects, which is important for future research using multisite, multidisorder resting-state functional MRI data.

  7. f

    Description and harmonization strategy for the predictor variables.

    • figshare.com
    xlsx
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan (2025). Description and harmonization strategy for the predictor variables. [Dataset]. http://doi.org/10.1371/journal.pone.0309572.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description and harmonization strategy for the predictor variables.

  8. i

    Household Expenditure and Income Survey 2010, Economic Research Forum (ERF)...

    • catalog.ihsn.org
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Hashemite Kingdom of Jordan Department of Statistics (DOS) (2019). Household Expenditure and Income Survey 2010, Economic Research Forum (ERF) Harmonization Data - Jordan [Dataset]. https://catalog.ihsn.org/index.php/catalog/7662
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    The Hashemite Kingdom of Jordan Department of Statistics (DOS)
    Time period covered
    2010 - 2011
    Area covered
    Jordan
    Description

    Abstract

    The main objective of the HEIS survey is to obtain detailed data on household expenditure and income, linked to various demographic and socio-economic variables, to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. Therefore, to achieve these goals, the sample had to be representative on the sub-district level. The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality.

    Data collected through the survey helped in achieving the following objectives: 1. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index 2. Study the consumer expenditure pattern prevailing in the society and the impact of demographic and socio-economic variables on those patterns 3. Calculate the average annual income of the household and the individual, and assess the relationship between income and different economic and social factors, such as profession and educational level of the head of the household and other indicators 4. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it 5. Provide the necessary data for the national accounts related to overall consumption and income of the household sector 6. Provide the necessary income data to serve in calculating poverty indices and identifying the poor characteristics as well as drawing poverty maps 7. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty

    Geographic coverage

    National

    Analysis unit

    • Households
    • Individuals

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The Household Expenditure and Income survey sample for 2010, was designed to serve the basic objectives of the survey through providing a relatively large sample in each sub-district to enable drawing a poverty map in Jordan. The General Census of Population and Housing in 2004 provided a detailed framework for housing and households for different administrative levels in the country. Jordan is administratively divided into 12 governorates, each governorate is composed of a number of districts, each district (Liwa) includes one or more sub-district (Qada). In each sub-district, there are a number of communities (cities and villages). Each community was divided into a number of blocks. Where in each block, the number of houses ranged between 60 and 100 houses. Nomads, persons living in collective dwellings such as hotels, hospitals and prison were excluded from the survey framework.

    A two stage stratified cluster sampling technique was used. In the first stage, a cluster sample proportional to the size was uniformly selected, where the number of households in each cluster was considered the weight of the cluster. At the second stage, a sample of 8 households was selected from each cluster, in addition to another 4 households selected as a backup for the basic sample, using a systematic sampling technique. Those 4 households were sampled to be used during the first visit to the block in case the visit to the original household selected is not possible for any reason. For the purposes of this survey, each sub-district was considered a separate stratum to ensure the possibility of producing results on the sub-district level. In this respect, the survey framework adopted that provided by the General Census of Population and Housing Census in dividing the sample strata. To estimate the sample size, the coefficient of variation and the design effect of the expenditure variable provided in the Household Expenditure and Income Survey for the year 2008 was calculated for each sub-district. These results were used to estimate the sample size on the sub-district level so that the coefficient of variation for the expenditure variable in each sub-district is less than 10%, at a minimum, of the number of clusters in the same sub-district (6 clusters). This is to ensure adequate presentation of clusters in different administrative areas to enable drawing an indicative poverty map.

    It should be noted that in addition to the standard non response rate assumed, higher rates were expected in areas where poor households are concentrated in major cities. Therefore, those were taken into consideration during the sampling design phase, and a higher number of households were selected from those areas, aiming at well covering all regions where poverty spreads.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    • General form
    • Expenditure on food commodities form
    • Expenditure on non-food commodities form

    Cleaning operations

    Raw Data: - Organizing forms/questionnaires: A compatible archive system was used to classify the forms according to different rounds throughout the year. A registry was prepared to indicate different stages of the process of data checking, coding and entry till forms were back to the archive system. - Data office checking: This phase was achieved concurrently with the data collection phase in the field where questionnaires completed in the field were immediately sent to data office checking phase. - Data coding: A team was trained to work on the data coding phase, which in this survey is only limited to education specialization, profession and economic activity. In this respect, international classifications were used, while for the rest of the questions, coding was predefined during the design phase. - Data entry/validation: A team consisting of system analysts, programmers and data entry personnel were working on the data at this stage. System analysts and programmers started by identifying the survey framework and questionnaire fields to help build computerized data entry forms. A set of validation rules were added to the entry form to ensure accuracy of data entered. A team was then trained to complete the data entry process. Forms prepared for data entry were provided by the archive department to ensure forms are correctly extracted and put back in the archive system. A data validation process was run on the data to ensure the data entered is free of errors. - Results tabulation and dissemination: After the completion of all data processing operations, ORACLE was used to tabulate the survey final results. Those results were further checked using similar outputs from SPSS to ensure that tabulations produced were correct. A check was also run on each table to guarantee consistency of figures presented, together with required editing for tables' titles and report formatting.

    Harmonized Data: - The Statistical Package for Social Science (SPSS) was used to clean and harmonize the datasets. - The harmonization process started with cleaning all raw data files received from the Statistical Office. - Cleaned data files were then merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program was generated for each dataset to generate/compute/recode/rename/format/label harmonized variables. - A post-harmonization cleaning process was run on the data. - Harmonized data was saved on the household as well as the individual level, in SPSS and converted to STATA format.

  9. Eligible studies from the CureSCi Metadata Catalog and their available...

    • plos.figshare.com
    xls
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan (2025). Eligible studies from the CureSCi Metadata Catalog and their available predictor variables. [Dataset]. http://doi.org/10.1371/journal.pone.0309572.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Eligible studies from the CureSCi Metadata Catalog and their available predictor variables.

  10. Harmonization of sediment diatoms from hundreds of lakes in the northeastern...

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Sep 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2022). Harmonization of sediment diatoms from hundreds of lakes in the northeastern United States [Dataset]. https://catalog.data.gov/dataset/harmonization-of-sediment-diatoms-from-hundreds-of-lakes-in-the-northeastern-united-states
    Explore at:
    Dataset updated
    Sep 13, 2022
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    Northeastern United States, United States
    Description

    Sediment diatoms are widely used to track environmental histories of lakes and their watersheds, but merging datasets generated by different researchers for further large-scale studies is challenging because of the taxonomic discrepancies caused by rapidly evolving diatom nomenclature and taxonomic concepts. Here we collated five datasets of lake sediment diatoms from the northeastern USA using a harmonization process which included updating synonyms, tracking the identity of inconsistently identified taxa and grouping those that could not be resolved taxonomically. The Dataset consists of a Portable Document Format (.pdf) file of the Voucher Flora, six Microsoft Excel (.xlsx) data files, an R script, and five output Comma Separated Values (.csv) files. The Voucher Flora documents the morphological species concepts in the dataset using diatom images compiled into plates (NE_Lakes_Voucher_Flora_102421.pdf) and the translation scheme of the OTU codes to diatom scientific or provisional names with identification sources, references, and notes (VoucherFloraTranslation_102421.xlsx). The file Slide_accession_numbers_102421.xlsx has slide accession numbers in the ANS Diatom Herbarium. The “DiatomHarmonization_032222_files for R.zip” archive contains four Excel input data files, the R code, and a subfolder “OUTPUT” with five .csv files. The file Counts_original_long_102421.xlsx contains original diatom count data in long format. The file Harmonization_102421.xlsx is the taxonomic harmonization scheme with notes and references. The file SiteInfo_031922.xlsx contains sampling site- and sample-level information. WaterQualityData_021822.xlsx is a supplementary file with water quality data. R code (DiatomHarmonization_032222.R) was used to apply the harmonization scheme to the original diatom counts to produce the output files. The resulting output files are five wide format files containing diatom count data at different harmonization steps (Counts_1327_wide.csv, Step1_1327_wide.csv, Step2_1327_wide.csv, Step3_1327_wide.csv) and the summary of the Indicator Species Analysis (INDVAL_RESULT.csv). The harmonization scheme (Harmonization_102421.xlsx) can be further modified based on additional taxonomic investigations, while the associated R code (DiatomHarmonization_032222.R) provides a straightforward mechanism to diatom data versioning. This dataset is associated with the following publication: Potapova, M., S. Lee, S. Spaulding, and N. Schulte. A harmonized dataset of sediment diatoms from hundreds of lakes in the northeastern United States. Scientific Data. Springer Nature, New York, NY, 9(540): 1-8, (2022).

  11. Labor Force Survey 2014, Economic Research Forum (ERF) Harmonization Data -...

    • catalog.ihsn.org
    Updated Jun 26, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Palestinian Central Bureau of Statistics (2017). Labor Force Survey 2014, Economic Research Forum (ERF) Harmonization Data - West Bank and Gaza [Dataset]. https://catalog.ihsn.org/index.php/catalog/6961
    Explore at:
    Dataset updated
    Jun 26, 2017
    Dataset provided by
    Palestinian Central Bureau of Statisticshttps://pcbs.gov/
    Economic Research Forum
    Time period covered
    2014
    Area covered
    West Bank, Gaza Strip, Gaza
    Description

    Abstract

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS

    The Palestinian Central Bureau of Statistics (PCBS) carried out four rounds of the Labor Force Survey 2014 (LFS). The survey rounds covered a total sample of about 25,736 households, and the number of completed questionaire is 16,891.

    The main objective of collecting data on the labour force and its components, including employment, unemployment and underemployment, is to provide basic information on the size and structure of the Palestinian labour force. Data collected at different points in time provide a basis for monitoring current trends and changes in the labour market and in the employment situation. These data, supported with information on other aspects of the economy, provide a basis for the evaluation and analysis of macro-economic policies.

    The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.

    Geographic coverage

    Covering a representative sample on the region level (West Bank, Gaza Strip), the locality type (urban, rural, camp) and the governorates.

    Analysis unit

    1- Household/family. 2- Individual/person.

    Universe

    The survey covered all Palestinian households who are a usual residence of the Palestinian Territory.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS

    The methodology was designed according to the context of the survey, international standards, data processing requirements and comparability of outputs with other related surveys.

    ---> Target Population: It consists of all individuals aged 10 years and Above and there are staying normally with their households in the state of Palestine during 2014.

    ---> Sampling Frame: The sampling frame consists of the master sample, which was updated in 2011: each enumeration area consists of buildings and housing units with an average of about 124 households. The master sample consists of 596 enumeration areas; we used 498 enumeration areas as a framework for the labor force survey sample in 2014 and these units were used as primary sampling units (PSUs).

    ---> Sampling Size: The estimated sample size is 7,616 households in each quarter of 2014, but in the second quarter 2014 only 7,541 households were collected, where 75 households couldn't be collected in Gaza Strip because of the Israeli aggression.

    ---> Sample Design The sample is two stage stratified cluster sample with two stages : First stage: we select a systematic random sample of 494 enumeration areas for the whole round ,and we excluded the enumeration areas which its sizes less than 40 households. Second stage: we select a systematic random sample of 16 households from each enumeration area selected in the first stage, se we select a systematic random of 16 households of the enumeration areas which its size is 80 household and over and the enumeration areas which its size is less than 80 households we select systematic random of 8 households.

    ---> Sample strata: The population was divided by: 1- Governorate (16 governorate) 2- Type of Locality (urban, rural, refugee camps).

    ---> Sample Rotation: Each round of the Labor Force Survey covers all of the 494 master sample enumeration areas. Basically, the areas remain fixed over time, but households in 50% of the EAs were replaced in each round. The same households remain in the sample for two consecutive rounds, left for the next two rounds, then selected for the sample for another two consecutive rounds before being dropped from the sample. An overlap of 50% is then achieved between both consecutive rounds and between consecutive years (making the sample efficient for monitoring purposes).

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The survey questionnaire was designed according to the International Labour Organization (ILO) recommendations. The questionnaire includes four main parts:

    ---> 1. Identification Data: The main objective for this part is to record the necessary information to identify the household, such as, cluster code, sector, type of locality, cell, housing number and the cell code.

    ---> 2. Quality Control: This part involves groups of controlling standards to monitor the field and office operation, to keep in order the sequence of questionnaire stages (data collection, field and office coding, data entry, editing after entry and store the data.

    ---> 3. Household Roster: This part involves demographic characteristics about the household, like number of persons in the household, date of birth, sex, educational level…etc.

    ---> 4. Employment Part: This part involves the major research indicators, where one questionnaire had been answered by every 15 years and over household member, to be able to explore their labour force status and recognize their major characteristics toward employment status, economic activity, occupation, place of work, and other employment indicators.

    Cleaning operations

    ---> Raw Data PCBS started collecting data since 1st quarter 2013 using the hand held devices in Palestine excluding Jerusalem in side boarders (J1) and Gaza Strip, the program used in HHD called Sql Server and Microsoft. Net which was developed by General Directorate of Information Systems. Using HHD reduced the data processing stages, the fieldworkers collect data and sending data directly to server then the project manager can withdrawal the data at any time he needs. In order to work in parallel with Gaza Strip and Jerusalem in side boarders (J1), an office program was developed using the same techniques by using the same database for the HHD.

    ---> Harmonized Data - The SPSS package is used to clean and harmonize the datasets. - The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency. - All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables. - A post-harmonization cleaning process is then conducted on the data. - Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.

    Response rate

    The survey sample consists of about 30,464 households of which 25,736 households completed the interview; whereas 16,891 households from the West Bank and 8,845 households in Gaza Strip. Weights were modified to account for non-response rate. The response rate in the West Bank reached 88.8% while in the Gaza Strip it reached 93.3%.

    Sampling error estimates

    ---> Sampling Errors Data of this survey may be affected by sampling errors due to use of a sample and not a complete enumeration. Therefore, certain differences can be expected in comparison with the real values obtained through censuses. Variances were calculated for the most important indicators: the variance table is attached with the final report. There is no problem in disseminating results at national or governorate level for the West Bank and Gaza Strip.

    ---> Non-Sampling Errors Non-statistical errors are probable in all stages of the project, during data collection or processing. This is referred to as non-response errors, response errors, interviewing errors, and data entry errors. To avoid errors and reduce their effects, great efforts were made to train the fieldworkers intensively. They were trained on how to carry out the interview, what to discuss and what to avoid, carrying out a pilot survey, as well as practical and theoretical training during the training course. Also data entry staff were trained on the data entry program that was examined before starting the data entry process. To stay in contact with progress of fieldwork activities and to limit obstacles, there was continuous contact with the fieldwork team through regular visits to the field and regular meetings with them during the different field visits. Problems faced by fieldworkers were discussed to clarify any issues. Non-sampling errors can occur at the various stages of survey implementation whether in data collection or in data processing. They are generally difficult to be evaluated statistically.

    They cover a wide range of errors, including errors resulting from non-response, sampling frame coverage, coding and classification, data processing, and survey response (both respondent and interviewer-related). The use of effective training and supervision and the careful design of questions have direct bearing on limiting the magnitude of non-sampling errors, and hence enhancing the quality of the resulting data. The implementation of the survey encountered non-response where the case ( household was not present at home ) during the fieldwork visit and the case ( housing unit is vacant) become the high percentage of the non response cases. The total

  12. AI Training Dataset In Healthcare Market Analysis, Size, and Forecast...

    • technavio.com
    pdf
    Updated Oct 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). AI Training Dataset In Healthcare Market Analysis, Size, and Forecast 2025-2029 : North America (US, Canada, and Mexico), Europe (Germany, UK, France, Italy, The Netherlands, and Spain), APAC (China, Japan, India, South Korea, Australia, and Indonesia), South America (Brazil, Argentina, and Colombia), Middle East and Africa (UAE, South Africa, and Turkey), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/ai-training-dataset-in-healthcare-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Oct 9, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Area covered
    Canada, United States
    Description

    Snapshot img { margin: 10px !important; } AI Training Dataset In Healthcare Market Size 2025-2029

    The ai training dataset in healthcare market size is forecast to increase by USD 829.0 million, at a CAGR of 23.5% between 2024 and 2029.

    The global AI training dataset in healthcare market is driven by the expanding integration of artificial intelligence and machine learning across the healthcare and pharmaceutical sectors. This technological shift necessitates high-quality, domain-specific data for applications ranging from ai in medical imaging to clinical operations. A key trend involves the adoption of synthetic data generation, which uses techniques like generative adversarial networks to create realistic, anonymized information. This approach addresses the persistent challenges of data scarcity and stringent patient privacy regulations. The development of applied ai in healthcare is dependent on such innovations to accelerate research timelines and foster more equitable model training.This advancement in ai training dataset creation helps circumvent complex legal frameworks and provides a method for data augmentation, especially for rare diseases. However, the market's progress is constrained by an intricate web of data privacy regulations and security mandates. Navigating compliance with laws like HIPAA and GDPR is a primary operational burden, as the process of de-identification is technically challenging and risks catastrophic compliance failures if re-identification occurs. This regulatory complexity, alongside the need for secure infrastructure for protected health information, acts as a bottleneck, impeding market growth and the broader adoption of ai in patient management and ai in precision medicine.

    What will be the Size of the AI Training Dataset In Healthcare Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019 - 2023 and forecasts 2025-2029 - in the full report.
    Request Free SampleThe market for AI training datasets in healthcare is defined by the continuous need for high-quality, structured information to power sophisticated machine learning algorithms. The development of AI in precision medicine and ai in cancer diagnostics depends on access to diverse and accurately labeled datasets, including digital pathology images and multi-omics data integration. The focus is shifting toward creating regulatory-grade datasets that can support clinical validation and commercialization of AI-driven diagnostic tools. This involves advanced data harmonization techniques and robust AI governance protocols to ensure reliability and safety in all applications.Progress in this sector is marked by the evolution from single-modality data to complex multimodal datasets. This shift supports a more holistic analysis required for applications like generative AI in clinical trials and treatment efficacy prediction. Innovations in synthetic data generation and federated learning platforms are addressing key challenges related to patient data privacy and data accessibility. These technologies enable the creation of large-scale, analysis-ready assets while adhering to strict compliance frameworks, supporting the ongoing advancement of applied AI in healthcare and fostering collaborative research environments.

    How is this AI Training Dataset In Healthcare Industry segmented?

    The ai training dataset in healthcare industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in "USD million" for the period 2025-2029, as well as historical data from 2019 - 2023 for the following segments. TypeImageTextOthersComponentSoftwareServicesApplicationMedical imagingElectronic health recordsWearable devicesTelemedicineOthersGeographyNorth AmericaUSCanadaMexicoEuropeGermanyUKFranceItalyThe NetherlandsSpainAPACChinaJapanIndiaSouth KoreaAustraliaIndonesiaSouth AmericaBrazilArgentinaColombiaMiddle East and AfricaUAESouth AfricaTurkeyRest of World (ROW)

    By Type Insights

    The image segment is estimated to witness significant growth during the forecast period.The image data segment is the most mature and largest component of the market, driven by the central role of imaging in modern diagnostics. This category includes modalities such as radiology images, digital pathology whole-slide images, and ophthalmology scans. The development of computer vision models and other AI models is a key factor, with these algorithms designed to improve the diagnostic capabilities of clinicians. Applications include identifying cancerous lesions, segmenting organs for pre-operative planning, and quantifying disease progression in neurological scans.The market for these datasets is sustained by significant technical and logistical hurdles, including the need for regulatory approval for AI-based medical devices, which elevates the demand for high-quality training datasets. The market'

  13. f

    Data_Sheet_1_Riemannian Geometry of Functional Connectivity Matrices for...

    • frontiersin.figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guillem Simeon; Gemma Piella; Oscar Camara; Deborah Pareto (2023). Data_Sheet_1_Riemannian Geometry of Functional Connectivity Matrices for Multi-Site Attention-Deficit/Hyperactivity Disorder Data Harmonization.zip [Dataset]. http://doi.org/10.3389/fninf.2022.769274.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Guillem Simeon; Gemma Piella; Oscar Camara; Deborah Pareto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The use of multi-site datasets in neuroimaging provides neuroscientists with more statistical power to perform their analyses. However, it has been shown that the imaging-site introduces variability in the data that cannot be attributed to biological sources. In this work, we show that functional connectivity matrices derived from resting-state multi-site data contain a significant imaging-site bias. To this aim, we exploited the fact that functional connectivity matrices belong to the manifold of symmetric positive-definite (SPD) matrices, making it possible to operate on them with Riemannian geometry. We hereby propose a geometry-aware harmonization approach, Rigid Log-Euclidean Translation, that accounts for this site bias. Moreover, we adapted other Riemannian-geometric methods designed for other domain adaptation tasks and compared them to our proposal. Based on our results, Rigid Log-Euclidean Translation of multi-site functional connectivity matrices seems to be among the studied methods the most suitable in a clinical setting. This represents an advance with respect to previous functional connectivity data harmonization approaches, which do not respect the geometric constraints imposed by the underlying structure of the manifold. In particular, when applying our proposed method to data from the ADHD-200 dataset, a multi-site dataset built for the study of attention-deficit/hyperactivity disorder, we obtained results that display a remarkable correlation with established pathophysiological findings and, therefore, represent a substantial improvement when compared to the non-harmonization analysis. Thus, we present evidence supporting that harmonization should be extended to other functional neuroimaging datasets and provide a simple geometric method to address it.

  14. f

    Data_Sheet_1_Comparison of different approaches to manage multi-site...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Apr 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bell, Tiffany K.; La, Parker L.; Craig, William; Yeates, Keith Owen; Zemek, Roger; Beauchamp, Miriam H.; Doan, Quynh; Harris, Ashley D. (2023). Data_Sheet_1_Comparison of different approaches to manage multi-site magnetic resonance spectroscopy clinical data analysis.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000946843
    Explore at:
    Dataset updated
    Apr 20, 2023
    Authors
    Bell, Tiffany K.; La, Parker L.; Craig, William; Yeates, Keith Owen; Zemek, Roger; Beauchamp, Miriam H.; Doan, Quynh; Harris, Ashley D.
    Description

    IntroductionThe effects caused by differences in data acquisition can be substantial and may impact data interpretation in multi-site/scanner studies using magnetic resonance spectroscopy (MRS). Given the increasing use of multi-site studies, a better understanding of how to account for different scanners is needed. Using data from a concussion population, we compare ComBat harmonization with different statistical methods in controlling for site, vendor, and scanner as covariates to determine how to best control for multi-site data.MethodsThe data for the current study included 545 MRS datasets to measure tNAA, tCr, tCho, Glx, and mI to study the pediatric concussion acquired across five sites, six scanners, and two different MRI vendors. For each metabolite, the site and vendor were accounted for in seven different models of general linear models (GLM) or mixed-effects models while testing for group differences between the concussion and orthopedic injury. Models 1 and 2 controlled for vendor and site. Models 3 and 4 controlled for scanner. Models 5 and 6 controlled for site applied to data harmonized by vendor using ComBat. Model 7 controlled for scanner applied to data harmonized by scanner using ComBat. All the models controlled for age and sex as covariates.ResultsModels 1 and 2, controlling for site and vendor, showed no significant group effect in any metabolites, but the vendor and site were significant factors in the GLM. Model 3, which included a scanner, showed a significant group effect for tNAA and tCho, and the scanner was a significant factor. Model 4, controlling for the scanner, did not show a group effect in the mixed model. The data harmonized by the vendor using ComBat (Models 5 and 6) had no significant group effect in both the GLM and mixed models. Lastly, the data harmonized by the scanner using ComBat (Model 7) showed no significant group effect. The individual site data suggest there were no group differences.ConclusionUsing data from a large clinical concussion population, different analysis techniques to control for site, vendor, and scanner in MRS data yielded different results. The findings support the use of ComBat harmonization for clinical MRS data, as it removes the site and vendor effects.

  15. c

    QuestionLink - Political Interest

    • datacatalogue.cessda.eu
    • da-ra.de
    Updated Nov 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Singh, Ranjit K. (2022). QuestionLink - Political Interest [Dataset]. http://doi.org/10.7802/2373
    Explore at:
    Dataset updated
    Nov 9, 2022
    Dataset provided by
    GESIS - Leibniz-Institut für Sozialwissenschaften
    Authors
    Singh, Ranjit K.
    Area covered
    Germany
    Description

    This repository of QuestionLink harmonization scripts for many measures of political interest is best accessed via the QuestionLink homepage:
    https://www.gesis.org/en/services/processing-and-analyzing-data/data-harmonization/question-link

    There you find general information on how to use QuestionLink to harmonize research data, on the method and technology behind QuestionLink, and an overview of other harmonized constructs.

    Information on the specific construct, political interest, can be accessed here: https://www.gesis.org/en/services/processing-and-analyzing-data/data-harmonization/question-link/political-interest

  16. VOTP Dataset

    • kaggle.com
    zip
    Updated Apr 10, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sdorius (2017). VOTP Dataset [Dataset]. https://www.kaggle.com/sdorius/votpharm
    Explore at:
    zip(24823052 bytes)Available download formats
    Dataset updated
    Apr 10, 2017
    Authors
    sdorius
    Description

    This is an integration of 10 independent multi-country, multi-region, multi-cultural social surveys fielded by Gallup International between 2000 and 2013. The integrated data file contains responses from 535,159 adults living in 103 countries. In total, the harmonization project combined 571 social surveys.

    These data have value in a number of longitudinal multi-country, multi-regional, and multi-cultural (L3M) research designs. Understood as independent, though non-random, L3M samples containing a number of multiple indicator ASQ (ask same questions) and ADQ (ask different questions) measures of human development, the environment, international relations, gender equality, security, international organizations, and democracy, to name a few [see full list below].

    The data can be used for exploratory and descriptive analysis, with greatest utility at low levels of resolution (e.g. nation-states, supranational groupings). Level of resolution in analysis of these data should be sufficiently low to approximate confidence intervals.

    These data can be used for teaching 3M methods, including data harmonization in L3M, 3M research design, survey design, 3M measurement invariance, analysis, and visualization, and reporting. Opportunities to teach about para data, meta data, and data management in L3M designs.

    The country units are an unbalanced panel derived from non-probability samples of countries and respondents> Panels (countries) have left and right censorship and are thusly unbalanced. This design limitation can be overcome to the extent that VOTP panels are harmonized with public measurements from other 3M surveys to establish balance in terms of panels and occasions of measurement. Should L3M harmonization occur, these data can be assigned confidence weights to reflect the amount of error in these surveys.

    Pooled public opinion surveys (country means), when combine with higher quality country measurements of the same concepts (ASQ, ADQ), can be leveraged to increase the statistical power of pooled publics opinion research designs (multiple L3M datasets)…that is, in studies of public, rather than personal, beliefs.

    The Gallup Voice of the People survey data are based on uncertain sampling methods based on underspecified methods. Country sampling is non-random. The sampling method appears be primarily probability and quota sampling, with occasional oversample of urban populations in difficult to survey populations. The sampling units (countries and individuals) are poorly defined, suggesting these data have more value in research designs calling for independent samples replication and repeated-measures frameworks.

    **The Voice of the People Survey Series is WIN/Gallup International Association's End of Year survey and is a global study that collects the public's view on the challenges that the world faces today. Ongoing since 1977, the purpose of WIN/Gallup International's End of Year survey is to provide a platform for respondents to speak out concerning government and corporate policies. The Voice of the People, End of Year Surveys for 2012, fielded June 2012 to February 2013, were conducted in 56 countries to solicit public opinion on social and political issues. Respondents were asked whether their country was governed by the will of the people, as well as their attitudes about their society. Additional questions addressed respondents' living conditions and feelings of safety around their living area, as well as personal happiness. Respondents' opinions were also gathered in relation to business development and their views on the effectiveness of the World Health Organization. Respondents were also surveyed on ownership and use of mobile devices. Demographic information includes sex, age, income, education level, employment status, and type of living area.

  17. i

    Employment and Unemployment Survey 2007, Economic Research Forum (ERF)...

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    Updated Jun 26, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Economic Research Forum (2017). Employment and Unemployment Survey 2007, Economic Research Forum (ERF) Harmonization Data - Jordan [Dataset]. https://datacatalog.ihsn.org/catalog/6942
    Explore at:
    Dataset updated
    Jun 26, 2017
    Dataset provided by
    Department of Statistics
    Economic Research Forum
    Time period covered
    2007
    Area covered
    Jordan
    Description

    Abstract

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN.

    The Department of Statistics (DOS) carried out four rounds of the 2007 Employment and Unemployment Survey (EUS) during February, May, August and November 2007. The survey rounds covered a total sample of about fifty three thousand households Nation-wide. The sampled households were selected using a stratified multi-stage cluster sampling design. It is noteworthy that the sample represents the national level (Kingdom), governorates, the three Regions (Central, North and South), and the urban/rural areas.

    The importance of this survey lies in that it provides a comprehensive data base on employment and unemployment that serves decision makers, researchers as well as other parties concerned with policies related to the organization of the Jordanian labor market.

    The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.

    Geographic coverage

    Covering a sample representative on the national level (Kingdom), governorates, the three Regions (Central, North and South), and the urban/rural areas.

    Analysis unit

    1- Household/family. 2- Individual/person.

    Universe

    The survey covered a national sample of households and all individuals permanently residing in surveyed households.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN

    Survey Frame

    The sample of this survey is based on the frame provided by the data of the Population and Housing Census, 2004. The Kingdom was divided into strata, where each city with a population of 100,000 persons or more was considered as a large city. The total number of these cities is 6. Each governorate (except for the 6 large cities) was divided into rural and urban areas. The rest of the urban areas in each governorate was considered as an independent stratum. The same was applied to rural areas where it was considered as an independent stratum. The total number of strata was 30.

    In view of the existing significant variation in the socio-economic characteristics in large cities in particular and in urban in general, each stratum of the large cities and urban strata was divided into four sub-stratum according to the socio- economic characteristics provided by the population and housing census with the purpose of providing homogeneous strata.

    The frame excludes collective dwellings, However, it is worth noting that the collective households identified in the harmonized data, through a variable indicating the household type, are those reported without heads in the raw data, and in which the relationship of all household members to head was reported "other".

    This sample is also not representative for the non-Jordanian population.

    Sample Design

    The sample of this survey was designed, using the two-stage cluster stratified sampling method, based on the data of the population and housing census 2004 for carrying out household surveys. The sample is representative on the Kingdom, rural-urban regions and governorates levels. The total sample size for each round was 1336 Primary Sampling Units (PSUs) (clusters). These units were distributed to urban and rural regions in the governorates, in addition to the large cities in each governorate according to the weight of persons and households, and according to the variance within each stratum. Slight modifications regarding the number of these units were made to cope with the multiple of 8, the number of clusters for four rounds was 5344.

    The main sample consists of 40 replicates, each replicate consists of 167 PSUs. For the purpose of each round, eight replicates of the main sample were used. The PSUs were ordered within each stratum according to geographic characteristics and then according to socio-economic characteristics in order to ensure good spread of the sample. Then, the sample was selected on two stages. In the first stage, the PSUs were selected using the Probability Proportionate to Size with systematic selection procedure. The number of households in each PSU served as its weight or size. In the second stage, the blocks of the PSUs (cluster) which were selected in the first stage have been updated. Then a constant number of households (10 households) was selected, using the random systematic sampling method as final PSUs from each PSU (cluster).

    Sampling notes

    It is noteworthy that the sample of the present survey does not represent the non-Jordanian population, due to the fact that it is based on households living in conventional dwellings. In other words, it does not cover the collective households living in collective dwellings. Therefore, the non-Jordanian households covered in the present survey are either private households or collective households living in conventional dwellings.

    Mode of data collection

    Face-to-face [f2f]

    Cleaning operations

    Raw Data

    The plan of the tabulation of survey results was guided by former Employment and Unemployment Surveys which were previously prepared and tested. The final survey report was then prepared to include all detailed tabulations as well as the methodology of the survey.

    Harmonized Data

    • The SPSS package is used to clean and harmonize the datasets.
    • The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency.
    • All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization.
    • A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables.
    • A post-harmonization cleaning process is then conducted on the data.
    • Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.
  18. Dataset of "A Metabolites Merging Strategy (MMS): Harmonization to enable...

    • data.europa.eu
    unknown
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). Dataset of "A Metabolites Merging Strategy (MMS): Harmonization to enable studies intercomparison" [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-8226097?locale=cs
    Explore at:
    unknown(157557)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    Description

    Metabolomics encounters challenges in cross-study comparisons due to diverse metabolite nomenclature and reporting practices. To bridge this gap, we introduce the Metabolites Merging Strategy (MMS), offering a systematic framework to harmonize multiple metabolite datasets for enhanced interstudy comparability. MMS has three steps. Step 1: Translation and merging of the different datasets by employing InChIKeys for data integration, encompassing the translation of metabolite names (if needed). Followed by Step 2: Attributes' retrieval from the InChIkey, including descriptors of name (title name from PubChem and RefMet name from Metabolomics Workbench), and chemical properties (molecular weight and molecular formula), both systematic (InChI, InChIKey, SMILES) and non-systematic identifiers (PubChem, CheBI, HMDB, KEGG, LipidMaps, DrugBank, Bin ID and CAS number), and their ontology. Finally, a meticulous three-step curation process is used to rectify disparities for conjugated base/acid compounds (optional step), missing attributes, and synonym checking (duplicated information). The MMS procedure is exemplified through a case study of urinary asthma metabolites, where MMS facilitated the identification of significant pathways hidden when no dataset merging strategy was followed. This study highlights the need for standardized and unified metabolite datasets to enhance the reproducibility and comparability of metabolomics studies.

  19. r

    Data from: Harmonizing OER metadata in ETL processes with SkoHub in the...

    • resodate.org
    • service.tib.eu
    Updated Jun 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steffen Roertgen (2022). Harmonizing OER metadata in ETL processes with SkoHub in the project “WirLernenOnline” [Dataset]. http://doi.org/10.25625/8MZSWB
    Explore at:
    Dataset updated
    Jun 24, 2022
    Dataset provided by
    Georg-August-Universität Göttingen
    GRO.data
    WirLernenOnline
    Authors
    Steffen Roertgen
    Description

    The metadata for Open Educational Resources (OER) are often made available in repositories without recourse to uniform value lists and corresponding standards for their attributes. This circumstance complicates data harmonization when OERs from different sources are to be merged in one search environment. With the help of the RDF standard SKOS and the tool SkoHub-Vocabs, the project "WirLernenOnline" has found an innovative, reusable and standards-based solution to this challenge. This involves the creation of SKOS vocabularies that are used during the ETL process to standardize different terms (for example, "math" and "mathematics"). This then forms the basis for providing users with consistent filtering options and a good search experience. The created and open licensed vocabularies can then easily be reused and linked to overcome this challenge in the future.

  20. d

    The Longitudinal IntermediaPlus Data Source (2014-2016)

    • da-ra.de
    Updated May 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inga Brentel; Céline Fabienne Kampes; Olaf Jandura (2022). The Longitudinal IntermediaPlus Data Source (2014-2016) [Dataset]. http://doi.org/10.4232/1.13530
    Explore at:
    Dataset updated
    May 25, 2022
    Dataset provided by
    GESIS
    da|ra
    Authors
    Inga Brentel; Céline Fabienne Kampes; Olaf Jandura
    Time period covered
    Oct 2013 - Sep 2014
    Description

    The prepared Longitudinal IntermediaPlus dataset 2014 to 2016 is a ´big data´, which is why the entire dataset will only be available in the form of a database (MySQL). In this database, the information of different variables of a respondent is organized in one column, one row per variable. The present data documentation shows the total database for online media use of the years 2014 to 2016. The data contains all variables of socio demography, free-time activities, additional information on a respondent and his household as well as the interview-specific variables and weights. Only the variables concerning the respondent´s media use are a selection: The online media use of all full online as well as their single entities for all genres whose business model is the provision of content is included - e-commerce, games, etc. were excluded. The media use of radio, print and TV is not included. Preparation for further years is possible, as is the preparation of cross-media media use for radio, press media and TV. Harmonization is available for radio and press media up to 2015 waiting to be applied. The digital process chain developed for data preparation and harmonization is published at GESIS and available for further projects updating the time series for further years. Recourse to these documents - Excel files, scripts, harmonization plans, etc. - is strongly recommended. The process and harmonization for the Longitudinal IntermediaPlus for 2014 to 2016 database was made available in accordance with the FAIR principles (Wilkinson et al. 2016). By harmonizing and pooling the cross-sectional datasets to one longitudinal dataset – which is being carried out by Inga Brentel and Céline Fabienne Kampes as part of the dissertation project ´Audience and Market Fragmentation online´ –, the aim is to make the data source of the media analysis, accessible for research on social and media change in Germany.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Administration for Children and Families (2025). ACF NIEM Human Services Domain Data Harmonization Process [Dataset]. https://data.virginia.gov/dataset/acf-niem-human-services-domain-data-harmonization-process

ACF NIEM Human Services Domain Data Harmonization Process

Explore at:
htmlAvailable download formats
Dataset updated
Sep 6, 2025
Dataset provided by
Administration for Children and Families
Description

ACF Agency Wide resource

Metadata-only record linking to the original dataset. Open original dataset below.

Search
Clear search
Close search
Google apps
Main menu