Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predictor variables used in analysis and the methods used to harmonize to the categorical variables.
bongo2112/harmonize-SDxl-Many-Comic-Style-Output dataset hosted on Hugging Face and contributed by the HF Datasets community
Harmonized Landsat Sentinel is a NASA initiative to produce a Virtual Constellation of surface reflectance (SR) data from the Operational Land Imager (OLI) and Multi-Spectral Instrument (MSI) aboard the Landsat 8-9 and Sentinel-2 remote sensing satellites, respectively. The combined measurement enables global observations of the land every 2–3 days. Input products are Landsat 8-9 Collection 2 Level 1 top-of-atmosphere reflectance and Sentinel-2 L1C top-of-atmosphere reflectance, which NASA radiometrically harmonizes to the maximum extent, resamples to common 30-meter resolution, and grids using the Sentinel-2 Military Grid Reference System (MGRS) UTM grid. Because of this, the products are different from Landsat 8-9 Collection 2 Level 2 surface reflectance and Sentinel-2 L2A surface reflectance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
- `visit_concert`: This is a standard CAP variables about visiting frequencies.
- `is_visit_concert`: binary variable, 0 if the person had not visited concerts in the previous 12 months.
- `artistic_activity_played_music`: A variable of the frequency of playing music as an amateur or professional practice, in some surveys we have only a binary variable (played in the last 12 months or not) in other we have frequencies. We will convert this into a binary variable.
- `artistic_activity_sung`: A variable of the frequency of singing as an amateur or professional practice, like played_muisc. Because of the liturgical use of singing, and the differences of religious practices among countries and gender, this is a significantly different variable from played_music.
- `age_exact`: The respondent’s age as an integer number.
- `country_code`: an ISO country code
- `geo`: an ISO code that separates Germany to the former East and West Germany, and the United Kingdom to Great Britain and Northern Ireland, and Cyprus to Cyprus and the Turiksh Cypriot community.[we may leave Turkish Cyprus out for practical reasons.]
- `age_education`: This is a harmonized education proxy. Because we work with the data of more than 30 countries, education levels are difficult to harmonize, and we use the Eurobarometer standard proxy, age of leaving education. It is a specially coded variable, and we will re-code them into two variables, `age_education` and `is_student`.
- `is_student`: is a dummy variable for the special coding in age_education for “still studying”, i.e. the person does not have yet a school leaving age. It would be tempting to impute `age` in this case to `age_education`, but we will show why this is not a good strategy.
- `w`, `w1`: Post-stratification weights for the 15+ years old population of each country. Use `w1` for averages of `geo` entities treating Northern Ireland, Great Britain, the United Kingdom, the former GDR, the former West Germany, and Germany as geographical areas. Use `w` when treating the United Kingdom and Germany as one territory.
- `wex`: Projected weight variable. For weighted average values, use `w`, `w1`, for projections on the population size, i.e., use with sums, use `wex`.
- `id`: The identifier of the original survey.
- `rowid``: A new unique identifier that is unique in all harmonized surveys, i.e., remains unique in the harmonized dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: Sodium zirconium cyclosilicate (SZC; formerly ZS-9) is a selective potassium (K+) binder for treatment of hyperkalemia. An open-label extension (OLE) of the HARMONIZE study evaluated efficacy and safety of SZC for ≤11 months. Methods: Patients from HARMONIZE with point-of-care device i-STAT K+ 3.5–6.2 mmol/L received once-daily SZC 5–10 g for ≤337 days. End points included achievement of mean serum K+ ≤5.1 mmol/L (primary) or ≤5.5 mmol/L (secondary). Results: Of 123 patients who entered the extension (mean serum K+ 4.8 mmol/L), 79 (64.2%) completed the study. The median daily dose of SZC was 10 g (range 2.5–15 g). The primary end point was achieved by 88.3% of patients, and 100% achieved the secondary end point. SZC was well tolerated with no new safety concerns. Conclusion: In the HARMONIZE OLE, most patients maintained mean serum K+ within the normokalemic range for ≤11 months during ongoing SZC treatment.
https://pacific-data.sprep.org/resource/shared-data-license-agreementhttps://pacific-data.sprep.org/resource/shared-data-license-agreement
The micro-assessment provides an overall assessment of the implementing Partner's programme, financial operations management policies, procedures, systems and internal controls.United Nations Development Programme
https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
+++++++++++++++ Version 3.0.0 +++++++++++++++
We carried out an harmonization of the Eurobarometer 2004-2021(spring). This dataset includes 35 single standard Eurobarometers, and morethan 140 variables about EU policies, attitudes towards Europe and the EU, identity, cognitive mobilization, political institutions, socio-political characteristics and partisanship, etc.
The harmonization was carried out using existing Eurobarometer datasets published by GESIS. To allow the user to replicate the harmonization and be able to modify some codes if needed, we publish one example of do-file used to pursue the harmonization, as well as the corresponding (harmonized) dataset. The user can find the do-file containing the codes used to modify and clean EB 953 (ZA7783, conducted in spring 2021) according to the harmonization procedure that we followed. Moreover, the user can find the cleaned dataset for EB 953 that was obtained after running the do-file. The files are named “EB 953.do” and “953_new.dta”.
We include: - a harmonized dataset ("harmonised_EB_2004-2021.dta"), - a technical report ("User Guide Harmonized Eurobarometer 2004-2021"), - a summary of the original survey questions corresponding to the variables included in the dataset ("Trends_EBs_1970-2021.xlsx"), - one of the do-files used to carry out the harmonization (“EB 953.do” ), - one of the datasets used before merging all datasets (“953_new.dta”).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of the top 10 differentially expressed genes inferred from concatenation of published counts (“published vs published”) versus those inferred from harmonized uniform GDC re-processing (“reprocessed vs reprocessed”).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This deposit contains the taxonomy maps and data we used to translate data on COVID-19 government responses from 7 different datasets into taxonomy developed by the CoronaNet Research Project (CoronaNet; Cheng et al 2020). These taxonomy maps form the basis of our efforts to harmonize this data into the CoronaNet database. The following taxonomy maps are deposited in the 'Taxonomy' folder:ACAPS COVID-19 Government Measures - CoronaNet Taxonomy Map Canadian Data Set of COVID-19 Interventions from the Canadian Institute for Health Information (CIHI) - CoronaNet Taxonomy Map COVID Analysis and Maping of Policies (COVID AMP) - CoronaNet Taxonomy Map Johns Hopkins Health Intervention Tracking for COVID-19 (HIT-COVID) - CoronaNet Taxonomy Map Oxford Covid-19 Government Response Tracker (OxCGRT) - CoronaNet Taxonomy Map World Health Organisation Public Health and Safety Measures (WHO PHSM) - CoronaNet Taxonomy MapMeanwhile the 'Data' folder contains the raw and mapped data for each external dataset (i.e. ACAPS, CIHI, COVID AMP, HIT-COVID, OxCGRT and WHO PHSM) as well as the combined external data for Steps 1 and 3 of the data harmonization process described in Cheng et al (2023) 'Harmonizing Government Responses to the COVID-19 Pandemic.'
Changes since the last version: in the .csv export there was a naming problem. - visit_concert
: This is a standard CAP variables about visiting frequencies, in numeric form. - fct_visit_concert
: This is a standard CAP variables about visiting frequencies, in categorical form. - is_visit_concert
: binary variable, 0 if the person had not visited concerts in the previous 12 months. - artistic_activity_played_music
: A variable of the frequency of playing music as an amateur or professional practice, in some surveys we have only a binary variable (played in the last 12 months or not) in other we have frequencies. We will convert this into a binary variable. - fct_artistic_activity_played_music
: The artistic_activity_played_music
in categorical representation. - artistic_activity_sung
: A variable of the frequency of singing as an amateur or professional practice, like played_muisc. Because of the liturgical use of singing, and the differences of religious practices among countries and gender, this is a significantly different variable from played_music. - fct_artistic_activity_sung
: The artistic_activity_sung
variable in categorical representation. - age_exact
: The respondent’s age as an integer number. - country_code
: an ISO country code - geo
: an ISO code that separates Germany to the former East and West Germany, and the United Kingdom to Great Britain and Northern Ireland, and Cyprus to Cyprus and the Turiksh Cypriot community.[we may leave Turkish Cyprus out for practical reasons.] - age_education
: This is a harmonized education proxy. Because we work with the data of more than 30 countries, education levels are difficult to harmonize, and we use the Eurobarometer standard proxy, age of leaving education. It is a specially coded variable, and we will re-code them into two variables, age_education
and is_student
. - is_student
: is a dummy variable for the special coding in age_education for “still studying”, i.e. the person does not have yet a school leaving age. It would be tempting to impute age
in this case to age_education
, but we will show why this is not a good strategy. - w
, w1
: Post-stratification weights for the 15+ years old population of each country. Use w1
for averages of geo
entities treating Northern Ireland, Great Britain, the United Kingdom, the former GDR, the former West Germany, and Germany as geographical areas. Use w
when treating the United Kingdom and Germany as one territory. - wex
: Projected weight variable. For weighted average values, use w
, w1
, for projections on the population size, i.e., use with sums, use wex
. - id
: The identifier of the original survey. - rowid
`: A new unique identifier that is unique in all harmonized surveys, i.e., remains unique in the harmonized dataset.
bongo2112/harmonize-SDxl-Styled-Output-Selected dataset hosted on Hugging Face and contributed by the HF Datasets community
These datasets represent a revised national scale estimate of wetland soil carbon stock assessments by improving representation of soil organic carbon densities. This assessment is based on a three-step approach to harmonize survey and point-based data for predicting soil organic carbon density from percent organic carbon alone (or percent organic matter, with conversion), when reliable dry bulk density information is not available. Given issues with survey-level extrapolation of soil pedons into discontinuous hydric soils, quantile, segmented data analysis provides a more accurate spatially explicit soil organic carbon density product. These modeled data leverage spatial and statistical distributions of soil organic carbon percent data of the conterminous United States (CONUS) for two national-scale soil datasets: a wetland-specific field campaign, the EPA National Wetland Condition Assessment, and the USDA NRCS SSURGO survey. See https://doi.org/10.3389/fsoil.2021.706701 for details.
Metabolomics encounters challenges in cross-study comparisons due to diverse metabolite nomenclature and reporting practices. To bridge this gap, we introduce the Metabolites Merging Strategy (MMS), offering a systematic framework to harmonize multiple metabolite datasets for enhanced interstudy comparability. MMS has three steps. Step 1: Translation and merging of the different datasets by employing InChIKeys for data integration, encompassing the translation of metabolite names (if needed). Followed by Step 2: Attributes' retrieval from the InChIkey, including descriptors of name (title name from PubChem and RefMet name from Metabolomics Workbench), and chemical properties (molecular weight and molecular formula), both systematic (InChI, InChIKey, SMILES) and non-systematic identifiers (PubChem, CheBI, HMDB, KEGG, LipidMaps, DrugBank, Bin ID and CAS number), and their ontology. Finally, a meticulous three-step curation process is used to rectify disparities for conjugated base/acid compounds (optional step), missing attributes, and synonym checking (duplicated information). The MMS procedure is exemplified through a case study of urinary asthma metabolites, where MMS facilitated the identification of significant pathways hidden when no dataset merging strategy was followed. This study highlights the need for standardized and unified metabolite datasets to enhance the reproducibility and comparability of metabolomics studies.
https://www.gnu.org/licenses/gpl-3.0https://www.gnu.org/licenses/gpl-3.0
The program PanTool was developed as a tool box like a Swiss Army Knife for data conversion and recalculation, written to harmonize individual data collections to standard import format used by PANGAEA. The format of input files the program PanTool needs is a tabular saved in plain ASCII. The user can create this files with a spread sheet program like MS-Excel or with the system text editor. PanTool is distributed as freeware for the operating systems Microsoft Windows, Apple OS X and Linux.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
Background
In recent years, Radiomics features (RFs) have been developed to provide quantitative, standardized information about shape, density/intensity and texture patterns on radiological images. Several studies showed limitations in the reproducibility of RFs in different acquisition settings. To date, reproducibility studies using CT images mainly rely on phantoms, due to the harness of patient exposure to X-rays. In this study we analyze the effects of CT acquisition parameters on RFs of lumbar vertebrae in a cadaveric donor.
Methods
112 unique CT acquisitions from cadaveric truck were performed on 3 different CT scanners varying KV, mA, field of view and reconstruction kernel settings. Lumbar vertebrae were segmented through a deep learning convolutional neural network and RFs were computed. The effects of each protocol on each RFs were assessed by univariate and multivariate Generalized Linear Model. Further, we compared the GLM model to the ComBat algorithm in the efficiency in harmonizing CT images.
Findings
From GLM, mA variation was not associated with alteration of RFs , whereas kV modification was associated with exponential variation of several RFs, including First Order (94.4%), GLCM (87.5%) and NGTDM (100%).
Upon cross-validation, ComBat algorithm obtained a mean R2 higher than 0.90 in 1 RFs (0.90%), whereas GLM model obtained high R2 in 21 RFs (19.6%), showing that the proposed GLM could effectively harmonize acquisitions better than ComBat.
Interpretation
This study represents the first attempt in describing the effects of CT acquisition parameters in bone RFs in a cadaveric donor. Our analyses showed that RFs could be substantially different according to the variation of each acquisition parameter and in dataset obtained from different CT scanners. These differences can be minimized using the proposed GLM model. Publicly available dataset and GLM could foster the research of Radiomics-based studies by increasing harmonization across CT protocols and vendors.
The Sudan Household Health Survey 2nd round (SHHS2) 2010 provides up-to-date information on the situation of children and women and measures of key indicators that allow countries to monitor progress towards the Millennium Development Goals (MDGs) and other internationally agreed upon commitments.
The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006 Household Health Survey in Sudan. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Sudan 2006 & 2010- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.
The sample harmonized and disseminated by the Economic research represents Northern Sudan only.
The Sudan Household Health Survey (SHHS) 2010 dataset covers the states of Northern Sudan only (Northern, River Nile, Red Sea, Kassala, Gedarif, Khartoum, Gezira, White Nile, Sinnar, Blue Nile, North Kordofan, South Kordofan, North Darfur, West Darfur and South Darfur).
1- Household/family. 2- Individual/person. 3- Woman. 4- Child.
The target universe for the SHHS includes the households and members of individual households, including nomadic households camping at a location/place at the time of the survey. The population living in institutions and group quarters such as hospitals, military bases and prisons, were excluded from the sampling frame.
Sample survey data [ssd]
Face-to-face [f2f]
Five sets of questionnaires were used in the Sudan Household Health Survey. The first three questionnaires are based on the MICS3 and PAPFAM model questionnaires. Those three were subject to harmonization.
1) Household questionnaire which was used to collect information on all de jure household members and the household. It included the following modules: - Household information panel - Household listing - Education - Female Genital Mutilation - Chronic diseases & injuries (Northern States only) - Tobacco use (Northern States only) - Child disability - Water and sanitation - Household characteristics - Insecticide treated nets - Salt iodization
2) Women's questionnaire administered to all women aged 15-49 years in each household. It included the following modules:
- Women's information panel
- Women's background
- Child mortality
- Desire for last birth
- Maternal and newborn health
- Illness symptoms
- Contraception
- Unmet need
- Marriage and union
- HIV/AIDS
- Birth history
- Female Genital Mutilation
- Attitudes towards domestic violence
- Sexual behavior STIs (Southern States only)
3) Under-five questionnaire administered to mothers. In case the mother was not listed in the household list/roster, a primary caretaker for the child was identified and interviewed. The Questionnaire for Children under Five included the following modules: - Under-five children information panel - Birth registration - Vitamin A supplementation - Breastfeeding - Care of illness - Immunization - Malaria - Anthropometry
4) Men's questionnaire administered to all men aged 15-49 years in each household. It included the following modules: - Men information panel - Men's background Marriage - Circumcision - Condom - Sexual behavior STIs - HIV/AIDS
5) Food Security Questionnaire which included the following modules: - Food security information panel - Income sources - Expenditures - Food consumption and dietary diversity
In addition to the administration of questionnaires, fieldwork teams tested the salt used for cooking in the households for iodine content, and measured the weights and heights of children under five years of age.
---> Harmonized Data:
Of the 15,000 households selected for the sample, 14,778 were successfully interviewed, yielding a response rate of 99 percent. Of the 18,614 women (age 15-49 years) identified in the selected households, 17,174 were successfully interviewed, yielding a response rate of 91.4 percent. Of the 13,587 children under age five listed in the households, questionnaires were completed for 13,282 children, which correspond to a response rate of 96.8 percent.
To evaluate the efficacy of two different doses (5 and 10 g) of ZS orally administered once daily (qd) vs placebo in maintaining normokalemia in initially hyperkalemic patients having achieved normokalemia following two days of initial ZS therapy (10g TID).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Reprocessed counts were generated using our GDC RNA-seq workflow implementation. NA rank changes indicate the DEG cannot be found in the other DEG list. (CSV)
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN
The Department of Statistics (DOS) carried out two rounds of the 2004 Employment and Unemployment Survey (EUS). The survey rounds covered a total sample of about fourteen households Nation-wide. The sampled households were selected using a stratified multi-stage cluster sampling design. It is noteworthy that the sample represents the national level (Kingdom), governorates, the three Regions (Central, North and South), and the urban/rural areas.
The importance of this survey lies in that it provides a comprehensive data base on employment and unemployment that serves decision makers, researchers as well as other parties concerned with policies related to the organization of the Jordanian labor market.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a sample representative on the national level (Kingdom), governorates, the three Regions (Central, North and South), and the urban/rural areas.
1- Household/family. 2- Individual/person.
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN
Face-to-face [f2f]
The questionnaire is divided into main topics, each containing a clear and consistent group of questions, and designed in a way that facilitates the electronic data entry and verification. The questionnaire includes the characteristics of household members in addition to the identification information, which reflects the administrative as well as the statistical divisions of the Kingdom.
The plan of the tabulation of survey results was guided by former Employment and Unemployment Surveys which were previously prepared and tested. The final survey report was then prepared to include all detailed tabulations as well as the methodology of the survey.
The cleaned and harmonized version of the survey data produced and published by the Economic Research Forum represents 100% of the original survey data collected by the Central Agency for Public Mobilization and Statistics (CAPMAS)
In any society, the human element represents the basis of the work force which exercises all the service and production activities. Therefore, it is a mandate to produce labor force statistics and studies, that is related to the growth and distribution of manpower and labor force distribution by different types and characteristics.
In this context, the Central Agency for Public Mobilization and Statistics conducts "Quarterly Labor Force Survey" which includes data on the size of manpower and labor force (employed and unemployed) and their geographical distribution by their characteristics.
By the end of each year, CAPMAS issues the annual aggregated labor force bulletin publication that includes the results of the quarterly survey rounds that represent the manpower and labor force characteristics during the year.
----> Historical Review of the Labor Force Survey:
1- The First Labor Force survey was undertaken in 1957. The first round was conducted in November of that year, the survey continued to be conducted in successive rounds (quarterly, bi-annually, or annually) till now.
2- Starting the October 2006 round, the fieldwork of the labor force survey was developed to focus on the following two points: a. The importance of using the panel sample that is part of the survey sample, to monitor the dynamic changes of the labor market. b. Improving the used questionnaire to include more questions, that help in better defining of relationship to labor force of each household member (employed, unemployed, out of labor force ...etc.). In addition to re-order of some of the already existing questions in much logical way.
3- Starting the January 2008 round, the used methodology was developed to collect more representative sample during the survey year. this is done through distributing the sample of each governorate into five groups, the questionnaires are collected from each of them separately every 15 days for 3 months (in the middle and the end of the month)
----> The survey aims at covering the following topics:
1- Measuring the size of the Egyptian labor force among civilians (for all governorates of the republic) by their different characteristics. 2- Measuring the employment rate at national level and different geographical areas. 3- Measuring the distribution of employed people by the following characteristics: gender, age, educational status, occupation, economic activity, and sector. 4- Measuring unemployment rate at different geographic areas. 5- Measuring the distribution of unemployed people by the following characteristics: gender, age, educational status, unemployment type "ever employed/never employed", occupation, economic activity, and sector for people who have ever worked.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a sample of urban and rural areas in all the governorates.
1- Household/family. 2- Individual/person.
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
The cleaned and harmonized version of the survey data produced and published by the Economic Research Forum represents 100% of the original survey data collected by the Central Agency for Public Mobilization and Statistics (CAPMAS)
Sample Design and Selection
The sample of the LFS 2006 survey is a simple systematic random sample.
Sample Size
The sample size varied in each quarter (it is Q1=19429, Q2=19419, Q3=19119 and Q4=18835) households with a total number of 76802 households annually. These households are distributed on the governorate level (urban/rural).
A more detailed description of the different sampling stages and allocation of sample across governorates is provided in the Methodology document available among external resources in Arabic.
Face-to-face [f2f]
The questionnaire design follows the latest International Labor Organization (ILO) concepts and definitions of labor force, employment, and unemployment.
The questionnaire comprises 3 tables in addition to the identification and geographic data of household on the cover page.
----> Table 1- Demographic and employment characteristics and basic data for all household individuals
Including: gender, age, educational status, marital status, residence mobility and current work status
----> Table 2- Employment characteristics table
This table is filled by employed individuals at the time of the survey or those who were engaged to work during the reference week, and provided information on: - Relationship to employer: employer, self-employed, waged worker, and unpaid family worker - Economic activity - Sector - Occupation - Effective working hours - Work place - Average monthly wage
----> Table 3- Unemployment characteristics table
This table is filled by all unemployed individuals who satisfied the unemployment criteria, and provided information on: - Type of unemployment (unemployed, unemployed ever worked) - Economic activity and occupation in the last held job before being unemployed - Last unemployment duration in months - Main reason for unemployment
----> Raw Data
Office editing is one of the main stages of the survey. It started once the questionnaires were received from the field and accomplished by the selected work groups. It includes: a-Editing of coverage and completeness b-Editing of consistency
----> Harmonized Data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predictor variables used in analysis and the methods used to harmonize to the categorical variables.