17 datasets found
  1. Additional file 3: of scAlign: a tool for alignment, integration, and rare...

    • springernature.figshare.com
    xls
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nelson Johansen; Gerald Quon (2023). Additional file 3: of scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data [Dataset]. http://doi.org/10.6084/m9.figshare.9631709.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Nelson Johansen; Gerald Quon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Contains supplementary marker gene information. (XLS 117 kb)

  2. f

    Table_1_Streamlining intersectoral provision of real-world health data: a...

    • figshare.com
    xlsx
    Updated Jun 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katja Hoffmann; Igor Nesterow; Yuan Peng; Elisa Henke; Daniela Barnett; Cigdem Klengel; Mirko Gruhl; Martin Bartos; Frank Nüßler; Richard Gebler; Sophia Grummt; Anne Seim; Franziska Bathelt; Ines Reinecke; Markus Wolfien; Jens Weidner; Martin Sedlmayr (2024). Table_1_Streamlining intersectoral provision of real-world health data: a service platform for improved clinical research and patient care.XLSX [Dataset]. http://doi.org/10.3389/fmed.2024.1377209.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 5, 2024
    Dataset provided by
    Frontiers
    Authors
    Katja Hoffmann; Igor Nesterow; Yuan Peng; Elisa Henke; Daniela Barnett; Cigdem Klengel; Mirko Gruhl; Martin Bartos; Frank Nüßler; Richard Gebler; Sophia Grummt; Anne Seim; Franziska Bathelt; Ines Reinecke; Markus Wolfien; Jens Weidner; Martin Sedlmayr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionObtaining real-world data from routine clinical care is of growing interest for scientific research and personalized medicine. Despite the abundance of medical data across various facilities — including hospitals, outpatient clinics, and physician practices — the intersectoral exchange of information remains largely hindered due to differences in data structure, content, and adherence to data protection regulations. In response to this challenge, the Medical Informatics Initiative (MII) was launched in Germany, focusing initially on university hospitals to foster the exchange and utilization of real-world data through the development of standardized methods and tools, including the creation of a common core dataset. Our aim, as part of the Medical Informatics Research Hub in Saxony (MiHUBx), is to extend the MII concepts to non-university healthcare providers in a more seamless manner to enable the exchange of real-world data among intersectoral medical sites.MethodsWe investigated what services are needed to facilitate the provision of harmonized real-world data for cross-site research. On this basis, we designed a Service Platform Prototype that hosts services for data harmonization, adhering to the globally recognized Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) international standard communication format and the Observational Medical Outcomes Partnership (OMOP) common data model (CDM). Leveraging these standards, we implemented additional services facilitating data utilization, exchange and analysis. Throughout the development phase, we collaborated with an interdisciplinary team of experts from the fields of system administration, software engineering and technology acceptance to ensure that the solution is sustainable and reusable in the long term.ResultsWe have developed the pre-built packages “ResearchData-to-FHIR,” “FHIR-to-OMOP,” and “Addons,” which provide the services for data harmonization and provision of project-related real-world data in both the FHIR MII Core dataset format (CDS) and the OMOP CDM format as well as utilization and a Service Platform Prototype to streamline data management and use.ConclusionOur development shows a possible approach to extend the MII concepts to non-university healthcare providers to enable cross-site research on real-world data. Our Service Platform Prototype can thus pave the way for intersectoral data sharing, federated analysis, and provision of SMART-on-FHIR applications to support clinical decision making.

  3. E

    Data from: Integration and harmonization of trait data from plant...

    • live.european-language-grid.eu
    • zenodo.org
    • +1more
    csv
    Updated Dec 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Data from: Integration and harmonization of trait data from plant individuals across heterogeneous sources [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/7662
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 13, 2023
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Trait data represent the basis for ecological and evolutionary research and have relevance for biodiversity conservation, ecosystem management and earth system modelling. The collection and mobilization of trait data has strongly increased over the last decade, but many trait databases still provide only species-level, aggregated trait values (e.g. ranges, means) and lack the direct observations on which those data are based. Thus, the vast majority of trait data measured directly from individuals remains hidden and highly heterogeneous, impeding their discoverability, semantic interoperability, digital accessibility and (re-)use. Here, we integrate quantitative measurements of verbatim trait information from plant individuals (e.g. lengths, widths, counts and angles of stems, leaves, fruits and inflorescence parts) from multiple sources such as field observations and herbarium collections. We develop a workflow to harmonize heterogeneous trait measurements (e.g. trait names and their values and units) as well as additional information related to taxonomy, measurement or fact and occurrence. This data integration and harmonization builds on vocabularies and terminology from existing metadata standards and ontologies such as the Ecological Trait-data Standard (ETS), the Darwin Core (DwC), the Thesaurus Of Plant characteristics (TOP) and the Plant Trait Ontology (TO). A metadata form filled out by data providers enables the automated integration of trait information from heterogeneous datasets. We illustrate our tools with data from palms (family Arecaceae), a globally distributed (pantropical), diverse plant family that is considered a good model system for understanding the ecology and evolution of tropical rainforests. We mobilize nearly 140,000 individual palm trait measurements in an interoperable format, identify semantic gaps in existing plant trait terminology and provide suggestions for the future development of a thesaurus of plant characteristics. Our work thereby promotes the semantic integration of plant trait data in a machine-readable way and shows how large amounts of small trait data sets and their metadata can be integrated into standardized data products.

  4. Labor Force Survey 2000, Economic Research Forum (ERF) Harmonization Data -...

    • catalog.ihsn.org
    • datacatalog.ihsn.org
    Updated Jun 26, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Economic Research Forum (2017). Labor Force Survey 2000, Economic Research Forum (ERF) Harmonization Data - West Bank and Gaza [Dataset]. https://catalog.ihsn.org/index.php/catalog/6991
    Explore at:
    Dataset updated
    Jun 26, 2017
    Dataset provided by
    Palestinian Central Bureau of Statisticshttp://pcbs.gov.ps/
    Economic Research Forum
    Time period covered
    2000
    Area covered
    West Bank, Gaza Strip, Gaza
    Description

    Abstract

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS

    The Palestinian Central Bureau of Statistics (PCBS) carried out four rounds of the Labor Force Survey 2000 (LFS).

    The importance of this survey lies in that it focuses mainly on labour force key indicators, main characteristics of the employed, unemployed, underemployed and persons outside labour force, labour force according to level of education, distribution of the employed population by occupation, economic activity, place of work, employment status, hours and days worked and average daily wage in NIS for the employees.

    The survey main objectives are: - To estimate the labor force and its percentage to the population. - To estimate the number of employed individuals. - To analyze labour force according to gender, employment status, educational level , occupation and economic activity. - To provide information about the main changes in the labour market structure and its socio economic characteristics. - To estimate the numbers of unemployed individuals and analyze their general characteristics. - To estimate the rate of working hours and wages for employed individuals in addition to analyze of other characteristics.

    The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.

    Geographic coverage

    Covering a representative sample on the region level (West Bank, Gaza Strip), the locality type (urban, rural, camp) and the governorates.

    Analysis unit

    1- Household/family. 2- Individual/person.

    Universe

    The survey covered all Palestinian households who are a usual residence of the Palestinian Territory.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS

    The methodology was designed according to the context of the survey, international standards, data processing requirements and comparability of outputs with other related surveys.

    Target Population:

    All Palestinians aged 10 years or older living in the Palestinian Territory, excluding those living in institutions such as prisons or shelters.

    Sampling Frame:

    The sampling frame consisted of a master sample of Enumeration Areas (EAs) selected from the population housing and establishment census 1997. The master sample consists of area units of relatively equal size (number of households), these units have been used as Primary Sampling Units (PSUs).

    Sample Design:

    The sample is a two-stage stratified cluster random sample.

    Stratification: Four levels of stratification were made:

    1. Stratification by Governorates.
    2. Stratification by type of locality which comprises: (a) Urban, (b) Rural, and (c) Refugee Camps
    3. Stratification by classifying localities, excluding governorate centers, into three strata based on the ownership of households of durable goods within these localities.
    4. Stratification by size of locality (number of households).

    Sample Size:

    The sample size in the first quarter consisted of 7,559 households, which amounts to a sample of around 29,650 persons aged 15 years and over (including 23,677 aged 15 years and over). In the second round the sample consisted of 7,559 households, which amounts to a sample of around 29,894 persons aged 10 years and over (including 23,890 aged 15 years and over), in the third round the sample consisted of 7,559 households, which amounts to a sample of around 29,709 persons aged 10 years and over (including 23,670 aged 15 years and over). In the fourth round the sample consisted of 7,559 households; of these only 7349 households have been interviewed due to the Israeli comprehensive closure and aggression against the Palestinian people, which amounts to 28380 persons aged 10 years and over (including 22495 aged 15 years and over).

    The sample size allowed for non-response and related losses. In addition, the average number of households selected in each cell was 16.

    Sample Rotation:

    Each round of the Labor Force Survey covers all the 481 master sample areas. Basically, the areas remain fixed over time, but households in 50% of the EAs are replaced each round. The same household remains in the sample over 2 consecutive rounds, rests for the next two rounds and represented again in the sample for another and last two consecutive rounds before it is dropped from the sample. A 50 % overlap is then achieved between both consecutive rounds and between consecutive years (making the sample efficient for monitoring purposes). In earlier applications of the LFS (rounds 1 to 11); the rotation pattern used was different; requiring a household to remain in the sample for six consecutive rounds, then dropped. The objective of such a pattern was to increase the overlap between consecutive rounds. The new rotation pattern was introduced to reduce the burden on the households resulting from visiting the same household for six consecutive times.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    One of the main survey tools is the questionnaire, the survey questionnaire was designed according to the International Labour Organization (ILO) recommendations. The questionnaire includes four main parts:

    1. Identification Data:

    The main objective for this part is to record the necessary information to identify the household, such as, cluster code, sector, type of locality, cell, housing number and the cell code.

    2. Quality Control:

    This part involves groups of controlling standards to monitor the field and office operation, to keep in order the sequence of questionnaire stages (data collection, field and office coding, data entry, editing after entry and store the data.

    3. Household Roster:

    This part involves demographic characteristics about the household, like number of persons in the household, date of birth, sex, educational level…etc.

    4. Employment Part:

    This part involves the major research indicators, where one questionnaire had been answered by every 15 years and over household member, to be able to explore their labour force status and recognize their major characteristics toward employment status, economic activity, occupation, place of work, and other employment indicators.

    Cleaning operations

    Raw Data

    The data processing stage consisted of the following operations: 1. Editing before data entry All questionnaires were then edited in the main office using the same instructions adopted for editing in the field.

    1. Coding At this stage, the Economic Activity variable underwent coding according to West Bank and Gaza Strip Standard commodities Classification, based on the United Nations ISIC-3. The Economic Activity for all employed and ever employed individuals was classified at the fourth-digit-level. The occupations were coded on the basis of the International Standard Occupational Classification of 1988 at the third-digit-level (ISCO-88).

    2. Data Entry In this stage data were entered into the computer, using a data entry template BLAISE. The data entry program was prepared in order to satisfy the following requirements:

    • Duplication of the questionnaire on the computer screen.
    • Logical and consistency checks of data entered.
    • Possibility for internal editing of questionnaire answers.
    • Maintaining a minimum of errors in digital data entry and fieldwork.
    • User-friendly handling

    Accordingly, data editing took place at a number of stages through the processing including: 1. office editing and coding 2. during data entry 3. structure checking and completeness 4. structural checking of SPSS data files

    Harmonized Data

    • The SPSS package is used to clean and harmonize the datasets.
    • The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency.
    • All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization.
    • A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables.
    • A post-harmonization cleaning process is then conducted on the data.
    • Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.

    Response rate

    The overall response rate for the survey was 89.5%

    More information on the distribution of response rates by different survey rounds is available in Page 11 of the data user guide provided among the disseminated survey materials under a file named "Palestine 2000- Data User Guide (English).pdf".

    Sampling error estimates

    Since the data reported here are based on a sample survey and not on a complete enumeration, they are subjected to sampling errors as well as non-sampling errors. Sampling errors are random outcomes of the sample design, and are, therefore, in principle measurable by the statistical concept of standard error.

    A

  5. Fundamental Data Record for Atmospheric Composition [ATMOS_L1B]

    • earth.esa.int
    Updated Sep 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Space Agency (2024). Fundamental Data Record for Atmospheric Composition [ATMOS_L1B] [Dataset]. https://earth.esa.int/eogateway/catalog/e
    Explore at:
    Dataset updated
    Sep 12, 2024
    Dataset authored and provided by
    European Space Agencyhttp://www.esa.int/
    License

    https://earth.esa.int/eogateway/documents/20142/1564626/Terms-and-Conditions-for-the-use-of-ESA-Data.pdfhttps://earth.esa.int/eogateway/documents/20142/1564626/Terms-and-Conditions-for-the-use-of-ESA-Data.pdf

    Time period covered
    Jun 28, 1995 - Apr 7, 2012
    Description

    The Fundamental Data Record (FDR) for Atmospheric Composition UVN v.1.0 dataset is a cross-instrument Level-1 product [ATMOS_L1B] generated in 2023 and resulting from the ESA FDR4ATMOS project. The FDR contains selected Earth Observation Level 1b parameters (irradiance/reflectance) from the nadir-looking measurements of the ERS-2 GOME and Envisat SCIAMACHY missions for the period ranging from 1995 to 2012. The data record offers harmonised cross-calibrated spectra with focus on spectral windows in the Ultraviolet-Visible-Near Infrared regions for the retrieval of critical atmospheric constituents like ozone (O3), sulphur dioxide (SO2), nitrogen dioxide (NO2) column densities, alongside cloud parameters. The FDR4ATMOS products should be regarded as experimental due to the innovative approach and the current use of a limited-sized test dataset to investigate the impact of harmonization on the Level 2 target species, specifically SO2, O3 and NO2. Presently, this analysis is being carried out within follow-on activities. The FDR V1 is currently being extended to include the MetOp GOME-2 series. Product format For many aspects, the FDR product has improved compared to the existing individual mission datasets: GOME solar irradiances are harmonised using a validated SCIAMACHY solar reference spectrum, solving the problem of the fast-changing etalon present in the original GOME Level 1b data; Reflectances for both GOME and SCIAMACHY are provided in the FDR product. GOME reflectances are harmonised to degradation-corrected SCIAMACHY values, using collocated data from the CEOS PIC sites; SCIAMACHY data are scaled to the lowest integration time within the spectral band using high-frequency PMD measurements from the same wavelength range. This simplifies the use of the SCIAMACHY spectra which were split in a complex cluster structure (with own integration time) in the original Level 1b data; The harmonization process applied mitigates the viewing angle dependency observed in the UV spectral region for GOME data; Uncertainties are provided. Each FDR product provides, within the same file, irradiance/reflectance data for UV-VIS-NIR special regions across all orbits on a single day, including therein information from the individual ERS-2 GOME and Envisat SCIAMACHY measurements. FDR has been generated in two formats: Level 1A and Level 1B targeting expert users and nominal applications respectively. The Level 1A [ATMOS_L1A] data include additional parameters such as harmonisation factors, PMD, and polarisation data extracted from the original mission Level 1 products. The ATMOS_L1A dataset is not part of the nominal dissemination to users. In case of specific requirements, please contact EOHelp. Please refer to the README file for essential guidance before using the data. All the new products are conveniently formatted in NetCDF. Free standard tools, such as Panoply, can be used to read NetCDF data. Panoply is sourced and updated by external entities. For further details, please consult our Terms and Conditions page. Uncertainty characterisation One of the main aspects of the project was the characterization of Level 1 uncertainties for both instruments, based on metrological best practices. The following documents are provided: General guidance on a metrological approach to Fundamental Data Records (FDR) Uncertainty Characterisation document Effect tables NetCDF files containing example uncertainty propagation analysis and spectral error correlation matrices for SCIAMACHY (Atlantic and Mauretania scene for 2003 and 2010) and GOME (Atlantic scene for 2003) reflectance_uncertainty_example_FDR4ATMOS_GOME.nc reflectance_uncertainty_example_FDR4ATMOS_SCIA.nc

  6. o

    Guiding national soil information providers towards INSPIRE compliance

    • explore.openaire.eu
    Updated Oct 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Van Genuchten; Jandirk Bulens; Luis Moreira de Sousa; Katharina Schleidt; Fenny Van Egmond (2024). Guiding national soil information providers towards INSPIRE compliance [Dataset]. http://doi.org/10.5281/zenodo.13970776
    Explore at:
    Dataset updated
    Oct 22, 2024
    Authors
    Paul Van Genuchten; Jandirk Bulens; Luis Moreira de Sousa; Katharina Schleidt; Fenny Van Egmond
    Description

    Open Access training materials for a training on INSPIRE good practices in the Soil domain. Soil experts, data scientists and IT operators from across Europe joined for a 3 day training on INSPIRE Good Practices around Soil Data at Wageningen University and Research in April 2022. Learning goals What is INSPIRE and its goals Experiences with INSPIRE Soil data harmonisation from EJP partners The INSPIRE Soil data model, a conceptual model to facilitate data interoperability Implementation options for the conceptual soil model Data transfer protocols for data exchange Formats and protocols for metadata discovery Vocabularies and Code lists in the Soil domain Upcoming developments Intended audience Technical background (data bases, web services) National or regional institutes which produce or consume soil data Contents - Why is the INSPIRE directive relevant for EJP, why this training. (Maria Fantappiè, CREA) - The reasoning behind INSPIRE why do we need a directive? (Joeri Robbrecht, European Commission) - Why do we need to understand INSPIRE and share data? (Jandrik Bulens, WENR) - Experiences of Implementing SOIL in INSPIRE. (Maria Fantappiè (CREA), Florian Hoedt (Thünen) and Dries Luts (AGIV)) - Conceptual Framework (Luis de Sousa, ISRIC) - Data discovery (Paul van Genuchten, ISRIC) - Interoperability; O&M, Sensorthings API, Web Coverage Services (Katharina Schleidt, Datacove) - Extending INSPIRE for the Air Quality directive (Katharina Schleidt, Datacove) - INSPIRE Soil: an overview and relations with other standards, the conceptual model of the soil theme as a common base (Kathi Schleidt, Datacove) - Harmonize, map, transform: what does it mean? (Paul van Genuchten, ISRIC) - Code lists in INSPIRE (Paul van Genuchten, ISRIC) - Implementation, operation, reporting. How do you keep track on progress? (Paul van Genuchten, ISRIC) - Technical aspects of view (WMS)-, download (WFS, Atom) services and data harmonization (Paul van Genuchten, ISRIC) - Adapting to evolved developments specifically WCS and SensorThings (Katharina Schleidt, Datacove) - Zooming in on INSPIRE and GloSIS mapping. What about tools and software to be used (Luis de Sousa, ISRIC) - Emerging data exchange technologies: OGC API; RDF/SPARQL, Gaia-x. Why, what and how? (Paul van Genuchten, ISRIC) EJP Soil The Horizon Europe EJP SOIL research project facilitated EU member states to share Soil data following the INSPIRE Directive with training on: Soil Data good practices. The training was organized by EJP SOIL to raise awareness and to facilitate member states in publishing harmonized Soil Data in the scope of current directives. EJP Soil has a work package dedicated to facilitating member states to publish harmonized soil data. In an inventory from 2021 it appeared that member states in general have a low awareness of the Soil Data Harmonization and Publication requirements of the various soil related directives, such as INSPIRE. Also a lack of experience with technologies to facilitate these processes was identified. For this reason, a training has been prepared dedicated specifically to harmonization and publication of Soil Data, based on the experiences from the wider INSPIRE community. About INSPIRE The INSPIRE Directive aims to create a European Union Spatial Data Infrastructure (SDI) for the purposes of EU environmental policies and policies or activities which may have an impact on the environment. This European Spatial Data Infrastructure will enable the sharing of environmental spatial information among public sector organisations, facilitate public access to spatial information across Europe and assist in policy-making across boundaries. INSPIRE is based on the infrastructures for spatial information established and operated by the Member States of the European Union. The Directive addresses 34 spatial data themes needed for environmental applications.

  7. Data from: PHYTOBASE: A global synthesis of open ocean phytoplankton...

    • doi.pangaea.de
    zip
    Updated Aug 1, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damiano Righetti; Meike Vogt; Niklaus E Zimmermann; Nicolas Gruber; Michael D Guiry (2019). PHYTOBASE: A global synthesis of open ocean phytoplankton occurrences [Dataset]. http://doi.org/10.1594/PANGAEA.904397
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 1, 2019
    Dataset provided by
    PANGAEA
    Authors
    Damiano Righetti; Meike Vogt; Niklaus E Zimmermann; Nicolas Gruber; Michael D Guiry
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Marine phytoplankton are responsible for half of the global net primary production and perform multiple other ecological functions and services of the global ocean. These photosynthetic organisms comprise more than 4300 marine species, but their biogeographic patterns and the resulting species diversity are poorly known, mostly owing to severe data limitations. Here, we compile, synthesize, and harmonize marine phytoplankton occurrence data from the two largest biological occurrence archives (Ocean Biogeographic Information System; OBIS, and Global Biodiversity Information Facility; GBIF) and three recent data collections. The resulting PhytoBase data set contains over 1.36 million marine phytoplankton occurrence records (1.28 million at the level of species) for a total of 1704 species, spanning the principal groups of the Bacillariophyceae, Dinoflagellata, and Haptophyta as well as several other groups. This data compilation increases the amount of phytoplankton occurrence data available through the single largest contributing archive (OBIS) by 65%. Data span all ocean basins, latitudes and most seasons. Analyzing the oceanic inventory of sampled phytoplankton species richness at the broadest spatial scales possible, using a resampling procedure, we find that richness tends to saturate in the pantropics at ~93% of all species in our database, at ~64% in temperate waters, and at ~35% in the cold Northern Hemisphere, while the Southern Hemisphere remains underexplored. We provide metadata on the cruise, research institution, depth, and date for each occurrence record. Cell-counts for 193 763 records are also included. We strongly recommend consideration of global spatiotemporal biases in sampling intensity and varying taxonomic sampling scopes between research programs when analyzing the occurrence database. Including such information into statistical analysis tools, such as species distribution models, may serve to project the diversity, niches, and distribution of species in the contemporary and future ocean, opening the door for a quantification of macro-ecological phytoplankton patterns.

  8. d

    Harmonized Tariff Schedule of the United States (2025)

    • catalog.data.gov
    Updated Feb 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Tariff Affairs and Trade Agreements (2025). Harmonized Tariff Schedule of the United States (2025) [Dataset]. https://catalog.data.gov/dataset/harmonized-tariff-schedule-of-the-united-states-2024
    Explore at:
    Dataset updated
    Feb 14, 2025
    Dataset provided by
    Office of Tariff Affairs and Trade Agreements
    Description

    This dataset is the current 2025 Harmonized Tariff Schedule plus all revisions for the current year. It provides the applicable tariff rates and statistical categories for all merchandise imported into the United States; it is based on the international Harmonized System, the global system of nomenclature that is used to describe most world trade in goods.

  9. G

    Harmonized 2021 census geography for Quebec

    • open.canada.ca
    • datasets.ai
    • +1more
    fgdb/gdb, html, pdf +1
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government and Municipalities of Québec (2025). Harmonized 2021 census geography for Quebec [Dataset]. https://open.canada.ca/data/dataset/1c54bc07-3fd7-489b-bcb5-3f318ec22255
    Explore at:
    fgdb/gdb, shp, html, pdfAvailable download formats
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Government and Municipalities of Québec
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Quebec, Québec City
    Description

    Geographic database containing the result of Statistics Canada's 2021 Census geography harmonization process. Harmonization is a process of editing geographic boundaries in order to adjust them spatially to the official geographic base. This work was carried out in 2022 by the Institut de la Statistique du Québec (ISQ) according to the official provincial geographic base. A metadata sheet shows the process of harmonizing this geography. _The content of the resulting database is detailed there (page 2), as well as the description of the attributes in the appendix (pages 7 to 11) _.**This third party metadata element was translated using an automated translation tool (Amazon Translate).**

  10. Labor Force Survey, LFS 2005 - Palestine

    • erfdataportal.com
    Updated Oct 11, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Economic Research Forum (2016). Labor Force Survey, LFS 2005 - Palestine [Dataset]. http://www.erfdataportal.com/index.php/catalog/84
    Explore at:
    Dataset updated
    Oct 11, 2016
    Dataset provided by
    Palestinian Central Bureau of Statisticshttp://pcbs.gov.ps/
    Economic Research Forum
    Time period covered
    2005
    Area covered
    Palestine
    Description

    Abstract

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS

    The Palestinian Central Bureau of Statistics (PCBS) carried out four rounds of the Labor Force Survey 2005 (LFS). The survey rounds covered a total sample of about 30252 households, and the number of completed questionaire was 26595, which amounts to a sample of around 92384 individuals aged 15 years and over.

    The importance of this survey lies in that it focuses mainly on labour force key indicators, main characteristics of the employed, unemployed, underemployed and persons outside labour force, labour force according to level of education, distribution of the employed population by occupation, economic activity, place of work, employment status, hours and days worked and average daily wage in NIS for the employees.

    The survey main objectives are: - To estimate the labor force and its percentage to the population. - To estimate the number of employed individuals. - To analyze labour force according to gender, employment status, educational level , occupation and economic activity. - To provide information about the main changes in the labour market structure and its socio economic characteristics. - To estimate the numbers of unemployed individuals and analyze their general characteristics. - To estimate the rate of working hours and wages for employed individuals in addition to analyze of other characteristics.

    The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.

    Geographic coverage

    Covering a representative sample on the region level (West Bank, Gaza Strip), the locality type (urban, rural, camp) and the governorates.

    Analysis unit

    1- Household/family. 2- Individual/person.

    Universe

    The survey covered all Palestinian households who are a usual residence of the Palestinian Territory.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS

    The methodology was designed according to the context of the survey, international standards, data processing requirements and comparability of outputs with other related surveys.

    Target Population:

    All Palestinians aged 10 years or older living in the Palestinian Territory, excluding those living in institutions such as prisons or shelters.

    Sampling Frame:

    The sampling frame consisted of a master sample of Enumeration Areas (EAs) selected from the population housing and establishment census 1997. The master sample consists of area units of relatively equal size (number of households), these units have been used as Primary Sampling Units (PSUs).

    Sample Design:

    The sample is a two-stage stratified cluster random sample.

    Stratification: Four levels of stratification were made:

    1. Stratification by Governorates.
    2. Stratification by type of locality which comprises: (a) Urban, (b) Rural, and (c) Refugee Camps
    3. Stratification by classifying localities, excluding governorate centers, into three strata based on the ownership of households of durable goods within these localities.
    4. Stratification by size of locality (number of households).

    Sample Size

    The sample size in the first round consisted of 7,563 households, which amounts to a sample of around 22,759 persons aged 15 years and over. In the second round the sample consisted of 7,563 households, which amounts to a sample of around 23,104 persons aged 15 years and over, in the third round the sample consisted of 7,563 households, which amounts to a sample of around 23,123 persons aged 15 years and over. In the fourth round the sample consisted of 7,563 households; which amounts to a sample of around 23,398 persons aged 15 years and over.

    The sample size allowed for non-response and related losses. In addition, the average number of households selected in each cell was 16.

    Sample Rotation:

    Each round of the Labor Force Survey covers all the 481 master sample areas. Basically, the areas remain fixed over time, but households in 50% of the EAs are replaced each round. The same household remains in the sample over 2 consecutive rounds, rests for the next two rounds and represented again in the sample for another and last two consecutive rounds before it is dropped from the sample. A 50 % overlap is then achieved between both consecutive rounds and between consecutive years (making the sample efficient for monitoring purposes). In earlier applications of the LFS (rounds 1 to 11); the rotation pattern used was different; requiring a household to remain in the sample for six consecutive rounds, then dropped. The objective of such a pattern was to increase the overlap between consecutive rounds. The new rotation pattern was introduced to reduce the burden on the households resulting from visiting the same household for six consecutive times.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    One of the main survey tools is the questionnaire, the survey questionnaire was designed according to the International Labour Organization (ILO) recommendations. The questionnaire includes four main parts:

    1. Identification Data: The main objective for this part is to record the necessary information to identify the household, such as, cluster code, sector, type of locality, cell, housing number and the cell code.

    2. Quality Control: This part involves groups of controlling standards to monitor the field and office operation, to keep in order the sequence of questionnaire stages (data collection, field and office coding, data entry, editing after entry and store the data.

    3. Household Roster: This part involves demographic characteristics about the household, like number of persons in the household, date of birth, sex, educational level…etc.

    4. Employment Part: This part involves the major research indicators, where one questionnaire had been answered by every 15 years and over household member, to be able to explore their labour force status and recognize their major characteristics toward employment status, economic activity, occupation, place of work, and other employment indicators.

    Cleaning operations

    Raw Data

    Data editing took place at a number of stages through the processing including: 1. office editing and coding 2. during data entry 3. structure checking and completeness 4. structural checking of SPSS data files

    Harmonized Data

    • The SPSS package is used to clean and harmonize the datasets.
    • The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency.
    • All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization.
    • A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables.
    • A post-harmonization cleaning process is then conducted on the data.
    • Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.

    Response rate

    The overall response rate for the survey was 93.2%

    More information on the distribution of response rates by different survey rounds is available in Page 12 of the data user guide provided among the disseminated survey materials under a file named "Palestine 2005- Data User Guide (English).pdf".

    Sampling error estimates

    Since the data reported here are based on a sample survey and not on a complete enumeration, they are subjected to sampling errors as well as non-sampling errors. Sampling errors are random outcomes of the sample design, and are, therefore, in principle measurable by the statistical concept of standard error. A description of the estimated standard errors and the effects of the sample design on sampling errors are provided in the annual report provided among the disseminated survey materials under a file named "Palestine 2005- LFS Annual Report (Arabic).pdf".

    Data appraisal

    Non-sampling errors can occur at the various stages of survey implementation whether in data collection or in data processing. They are generally difficult to be evaluated statistically. They cover a wide range of errors, including errors resulting from non-response, sampling frame coverage, coding and classification, data processing, and survey response (both respondent and interviewer-related). The use of effective training and supervision and the careful design of questions have direct bearing on limiting the magnitude of non-sampling errors, and hence enhancing the quality of the resulting data. The following are possible sources of non-sampling errors:

    • Errors due to non-response because households were away from home or refused to participate. The overall non response rate amounted to almost 12.1% which is relatively low; a much higher rates is rather common in an international perspective. The refusal rate was only 0.8%. It is difficult

  11. e

    COVID 19 MENA Monitor Enterprise Survey, CMMENT – Wave 1 - Tunisia

    • erfdataportal.com
    Updated Oct 14, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Economics Research Forum (2021). COVID 19 MENA Monitor Enterprise Survey, CMMENT – Wave 1 - Tunisia [Dataset]. http://www.erfdataportal.com/index.php/catalog/219
    Explore at:
    Dataset updated
    Oct 14, 2021
    Dataset authored and provided by
    Economics Research Forum
    Time period covered
    2021
    Area covered
    Tunisia
    Description

    Abstract

    To better understand the impact of the shock induced by the COVID-19 pandemic on micro and small enterprises in Tunisia and assess the policy responses in a rapidly changing context, reliable data is imperative, and the need to resort to a dynamic data collection tool at a time when countries in the region are in a state of flux cannot be overstated. The COVID-19 MENA Monitor Survey was led by the Economic Research Forum (ERF) to provide data for researchers and policy makers on the economic and labor market impact of the global COVID-19 pandemic on enterprises.

    The ERF COVID-19 MENA Monitor Survey is constructed using a series of short panel phone surveys, that are conducted approximately every two months, and it will cover business closure (temporary/permanent) due to lockdowns, ability to telework/deliver the service, disruptions to supply chains (for inputs and outputs), loss of product markets, increased cost of supplies, worker layoffs, salary adjustments, access to lines of credit and delays in transportation. Understanding the strategies of enterprises (particularly micro and small enterprises) to cope with the crisis is one of the main objectives of this survey. Specific constraints such as weak access to the internet in some areas or laws constraining goods' delivery will be analyzed. Enterprise owners will also be asked about prospects for the future, including ability to stay open, and whether they benefited from any measures to support their businesses.

    The ERF COVID-19 MENA Monitor Survey is a wide-ranging, nationally representative panel survey. The baseline wave of this dataset was collected in February 2021 and harmonized by the Economic Research Forum (ERF) and is featured as wave 1 for enterprise data. The survey is in the process of further expansion to include other waves. The harmonization was designed to create comparable data that can facilitate cross-country and comparative research between other Arab countries (Morocco, Egypt and Jordan). All the COVID-19 MENA Monitor surveys incorporate similar survey designs, with data on enterprises within Arab countries (Egypt, Jordan, Tunisia and Morocco).

    Geographic coverage

    National

    Analysis unit

    Enterprises

    Universe

    The sample universe for the enterprise survey was enterprises that had 6-199 workers pre-COVID-19

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Use the National Institute of Statistics (INS) and Agency for the Promotion of Industry and Innovation (APII) databases as follow: o Tunisia did not have a Yellow Pages or similar database, so administrative/statistics data sources had to be used o The sample started with the INS frame with 1,238 enterprises with 6-200 wage employees · Enterprises were stratified into: (1) Agriculture (2) Industry (3) Construction (4) Trade (5) Accommodation (6) Service · Enterprises were also stratified by size in terms of 6-49 versus 50-200 employees · A random stratified sample (order) was selected · Further restricted to enterprises with 6-199 workers in February 2020 based on an eligibility question during the phone interview · This sample frame was eventually exhausted o After the INS sample was exhausted, the APII sample was used · APII only covered enterprises with 10+ workers · APII only covered (1) services & transport, and (2) industry o Weights are based on the underlying data on all enterprises from INS, specifically: Entreprises privées selon l'activité principale et la tranche de salariés (RNE 2019). · We ultimately stratify the Tunisia weights by industry and enterprises sized: 6-9 employees (since APII only covered 10+), 10-49, and 50-199

    Mode of data collection

    Computer Assisted Telephone Interview [cati]

    Research instrument

    The enterprise questionnaire is carried out to understand the strategies of enterprises -particularly micro and small enterprises- to cope with the crisis as well as related constraints and prospects for the future. It includes questions on business closure (temporary/permanent) due to lockdowns, ability to telework/deliver the service, disruptions to supply chains (for inputs and outputs), loss of product markets, increased cost of supplies, worker layoffs, salary adjustments, access to lines of credit and delays in transportation.

    Note: The questionnaire can be seen in the documentation materials tab.

  12. f

    DataSheet11_Uniformly shaped harmonization combines human transcriptomic...

    • frontiersin.figshare.com
    xlsx
    Updated Sep 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolas Borisov; Victor Tkachev; Alexander Simonov; Maxim Sorokin; Ella Kim; Denis Kuzmin; Betul Karademir-Yilmaz; Anton Buzdin (2023). DataSheet11_Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns.xlsx [Dataset]. http://doi.org/10.3389/fmolb.2023.1237129.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Sep 6, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicolas Borisov; Victor Tkachev; Alexander Simonov; Maxim Sorokin; Ella Kim; Denis Kuzmin; Betul Karademir-Yilmaz; Anton Buzdin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction: Co-normalization of RNA profiles obtained using different experimental platforms and protocols opens avenue for comprehensive comparison of relevant features like differentially expressed genes associated with disease. Currently, most of bioinformatic tools enable normalization in a flexible format that depends on the individual datasets under analysis. Thus, the output data of such normalizations will be poorly compatible with each other. Recently we proposed a new approach to gene expression data normalization termed Shambhala which returns harmonized data in a uniform shape, where every expression profile is transformed into a pre-defined universal format. We previously showed that following shambhalization of human RNA profiles, overall tissue-specific clustering features are strongly retained while platform-specific clustering is dramatically reduced.Methods: Here, we tested Shambhala performance in retention of fold-change gene expression features and other functional characteristics of gene clusters such as pathway activation levels and predicted cancer drug activity scores.Results: Using 6,793 cancer and 11,135 normal tissue gene expression profiles from the literature and experimental datasets, we applied twelve performance criteria for different versions of Shambhala and other methods of transcriptomic harmonization with flexible output data format. Such criteria dealt with the biological type classifiers, hierarchical clustering, correlation/regression properties, stability of drug efficiency scores, and data quality for using machine learning classifiers.Discussion: Shambhala-2 harmonizer demonstrated the best results with the close to 1 correlation and linear regression coefficients for the comparison of training vs validation datasets and more than two times lesser instability for calculation of drug efficiency scores compared to other methods.

  13. f

    ARCH ontologies and terminologies vs OMOP.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffrey G. Klann; Matthew A. H. Joss; Kevin Embree; Shawn N. Murphy (2023). ARCH ontologies and terminologies vs OMOP. [Dataset]. http://doi.org/10.1371/journal.pone.0212463.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Jeffrey G. Klann; Matthew A. H. Joss; Kevin Embree; Shawn N. Murphy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ARCH ontologies and terminologies vs OMOP.

  14. Data from: Antiphospholipid IgG Certified Reference Material ERM®-...

    • zenodo.org
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Claudia Grossi; Claudia Grossi; Liesbet Deprez; Liesbet Deprez; Caterina Bodio; Caterina Bodio; Maria Orietta Borghi; Maria Orietta Borghi; Suresh Kumar; Suresh Kumar; Nicola Pozzi; Nicola Pozzi; Paolo Macor; Paolo Macor; Silvia Piantoni; Silvia Piantoni; Angela Tincani; Angela Tincani; Massimo Radin; Massimo Radin; Savino Sciascia; Savino Sciascia; Gustavo Martos; Gustavo Martos; Evanthia Monogioudi; Evanthia Monogioudi; Ingrid Zegers; Joanna Sheldon; Joanna Sheldon; Rohan Willis; Rohan Willis; Pier Luigi Meroni; Pier Luigi Meroni; Ingrid Zegers (2025). Antiphospholipid IgG Certified Reference Material ERM®- DA477/IFCC: a tool for aPL harmonization? [Dataset]. http://doi.org/10.5281/zenodo.13849410
    Explore at:
    Dataset updated
    Mar 11, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Claudia Grossi; Claudia Grossi; Liesbet Deprez; Liesbet Deprez; Caterina Bodio; Caterina Bodio; Maria Orietta Borghi; Maria Orietta Borghi; Suresh Kumar; Suresh Kumar; Nicola Pozzi; Nicola Pozzi; Paolo Macor; Paolo Macor; Silvia Piantoni; Silvia Piantoni; Angela Tincani; Angela Tincani; Massimo Radin; Massimo Radin; Savino Sciascia; Savino Sciascia; Gustavo Martos; Gustavo Martos; Evanthia Monogioudi; Evanthia Monogioudi; Ingrid Zegers; Joanna Sheldon; Joanna Sheldon; Rohan Willis; Rohan Willis; Pier Luigi Meroni; Pier Luigi Meroni; Ingrid Zegers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2023 - 2024
    Description

    Dataset associated with the following paper DOI: 10.1515/cclm-2025-0032

    This set of raw data cannot be made publicly available as including sensitive information, but might be available to interested researchers upon reasonable requests. Please, forward your request to: c.grossi@auxologico.it

  15. f

    DataSheet2_Uniformly shaped harmonization combines human transcriptomic data...

    • frontiersin.figshare.com
    docx
    Updated Sep 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolas Borisov; Victor Tkachev; Alexander Simonov; Maxim Sorokin; Ella Kim; Denis Kuzmin; Betul Karademir-Yilmaz; Anton Buzdin (2023). DataSheet2_Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns.docx [Dataset]. http://doi.org/10.3389/fmolb.2023.1237129.s004
    Explore at:
    docxAvailable download formats
    Dataset updated
    Sep 6, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicolas Borisov; Victor Tkachev; Alexander Simonov; Maxim Sorokin; Ella Kim; Denis Kuzmin; Betul Karademir-Yilmaz; Anton Buzdin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction: Co-normalization of RNA profiles obtained using different experimental platforms and protocols opens avenue for comprehensive comparison of relevant features like differentially expressed genes associated with disease. Currently, most of bioinformatic tools enable normalization in a flexible format that depends on the individual datasets under analysis. Thus, the output data of such normalizations will be poorly compatible with each other. Recently we proposed a new approach to gene expression data normalization termed Shambhala which returns harmonized data in a uniform shape, where every expression profile is transformed into a pre-defined universal format. We previously showed that following shambhalization of human RNA profiles, overall tissue-specific clustering features are strongly retained while platform-specific clustering is dramatically reduced.Methods: Here, we tested Shambhala performance in retention of fold-change gene expression features and other functional characteristics of gene clusters such as pathway activation levels and predicted cancer drug activity scores.Results: Using 6,793 cancer and 11,135 normal tissue gene expression profiles from the literature and experimental datasets, we applied twelve performance criteria for different versions of Shambhala and other methods of transcriptomic harmonization with flexible output data format. Such criteria dealt with the biological type classifiers, hierarchical clustering, correlation/regression properties, stability of drug efficiency scores, and data quality for using machine learning classifiers.Discussion: Shambhala-2 harmonizer demonstrated the best results with the close to 1 correlation and linear regression coefficients for the comparison of training vs validation datasets and more than two times lesser instability for calculation of drug efficiency scores compared to other methods.

  16. i

    Household Income, Expenditure, and Consumption Survey 2008 - Egypt, Arab...

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Agency for Public Mobilization & Statistics (2019). Household Income, Expenditure, and Consumption Survey 2008 - Egypt, Arab Rep. [Dataset]. https://datacatalog.ihsn.org/catalog/5868
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    Central Agency for Public Mobilization & Statistics
    Time period covered
    2008 - 2009
    Area covered
    Egypt, Arab Rep.
    Description

    Abstract

    The Household Income, Expenditure and Consumption Survey (HIECS) is of great importance among other household surveys conducted by statistical agencies in various countries around the world. This survey provides a large amount of data to rely on in measuring the living standards of households and individuals, as well as establishing databases that serve in measuring poverty, designing social assistance programs, and providing necessary weights to compile consumer price indices, considered to be an important indicator to assess inflation.

    The survey's main objectives are: - To identify expenditure levels and patterns of population as well as socio- economic and demographic differentials. - To estimate the quantities, values of commodities and services consumed by households during the survey period to determine the levels of consumption and estimate the current demand which is important to predict future demands. - To measure mean household and per-capita expenditure for various expenditure items along with socio-economic correlates. - To define percentage distribution of expenditure for various items used in compiling consumer price indices which is considered important indicator for measuring inflation. - To define mean household and per-capita income from different sources. - To provide data necessary to measure standard of living for households and individuals. Poverty analysis and setting up a basis for social welfare assistance are highly dependent on the results of this survey. - To provide essential data to measure elasticity which reflects the percentage change in expenditure for various commodity and service groups against the percentage change in total expenditure for the purpose of predicting the levels of expenditure and consumption for different commodity and service items in urban and rural areas. - To provide data essential for comparing change in expenditure against change in income to measure income elasticity of expenditure. - To study the relationships between demographic, geographical, housing characteristics of households and their income and expenditure for commodities and services. - To provide data necessary for national accounts especially in compiling inputs and outputs tables. - To identify consumers behavior changes among socio-economic groups in urban and rural areas. - To identify per capita food consumption and its main components of calories, proteins and fats according to its sources and the levels of expenditure in both urban and rural areas. - To identify the value of expenditure for food according to sources, either from household production or not, in addition to household expenditure for non-food commodities and services. - To identify distribution of households according to the possession of some appliances and equipment such as (cars, satellites, mobiles ...) in urban and rural areas. - To identify the percentage distribution of income recipients according to some background variables such as housing conditions, size of household and characteristics of head of household.

    Geographic coverage

    Covering a sample of urban and rural areas in all the governorates.

    Analysis unit

    • Household/families
    • Individuals

    Universe

    The survey covered a national sample of households and all individuals permanently residing in surveyed households.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The 2008/2009 HIECS is a two-stage stratified cluster sample, approximately self-weighted, of nearly 48000 household in urban and rural areas. The main elements of the sampling design are described below.

    • Sample Size: It has been deemed important to retain the same sample size of the previous two HIECS rounds. Thus, a sample of about 48000 households has been considered. The justification of maintaining the sample size at this level is to have estimates with levels of precision similar to those of the previous two rounds: therefore trend analysis with the previous two surveys will not be distorted by substantial changes in sampling errors from round to another. In addition, this relatively large national sample implies proportional samples of reasonable sizes for smaller governorates. Nonetheless, oversampling has been introduced to raise the sample size of small governorates to about 1000 households. As a result, reasonably precise estimates could be extracted for those governorates. The oversampling has resulted in a slight increase in the national sample to 48658 households.

    • Cluster size: An important lesson learned from the previous two HIECS rounds is that the cluster size applied in both surveys is found to be too large to yield an accepted design effect estimates. The cluster size was 40 households in the 2004-2005 round, descending from 80 households in the 1999-2000 round. The estimates of the design effect (deft) for most survey measures of the latest round were extraordinary large. As a result, the cluster size was decreased to only 19 households (20 households in urban governorates to account for anticipated non-response in those governorate. In view of past experience non-response is almost nil in rural governorates).

    A more detailed description of the different sampling stages and allocation of sample across governorates is provided in the Methodology document that is provided as an external resources in both Arabic and English.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    Three different questionnaires were used: 1- Expenditure and consumption questionnaire 2- Diary questionnaire for expenditure and consumption 3- Income questionnaire

    Cleaning operations

    Harmonized Data - The Statistical Package for Social Science (SPSS) is used to clean and harmonize the datasets. - The harmonization process starts with cleaning all raw data files received from the Statistical Office. - Cleaned data files are then all merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program is generated for each dataset to generate/compute/recode/rename/format/label harmonized variables. - A post-harmonization cleaning process is run on the data. - Harmonized data is saved on the household as well as the individual level, in SPSS and converted to STATA format.

    Response rate

    For the total sample, the response rate was 96.3% (93.95% in urban areas and 98.4% in rural areas).

    Sampling error estimates

    The sampling error of major survey estimates has been derived using the Ultimate Cluster Method as applied in the CENVAR Module of the Integrated Microcomputer Processing System (IMPS) Package. In addition to the estimate of sampling error, the output includes estimates of coefficient of variation, design effect (DEFF) and 95% confidence intervals.

    Data appraisal

    The precision of survey results depends to a large extent on how the survey has been prepared for. As such, it was deemed crucial to exert much effort and to take necessary actions towards rigorous preparation for the present survey. The preparatory activities, extended over 3 months, included forming Technical Committee. The Committee has set up the general framework of survey implementation such as:

    1- Applying the recent international recommendations of different concepts and definitions of income and expenditure considering maintaining the consistency with the previous surveys in order to compare and study the changes in pertinent indicators.

    2- Evaluating the quality of data in all different Implementation stages to avoid or minimize errors to the lowest extent possible through: - Implementing field editing after finishing data collection for households in governorates to avoid any errors in suitable time. - Setting up a program for the Survey Technical Committee Members and survey staff for visiting field work in all governorates (each 15 days) to solve any problem in the proper time. - Re-interviewing a sample of households by Quality Control Department and examining the differences with the original responses. - For the purpose of quality assurance, tables were generated for each survey round where internal consistency checks were performed to study the plausibility of mean household expenditure on major expenditure commodity groups and its variability over major geographic regions.

  17. f

    Data_Sheet_1_Protocol for the Cultural Translation and Adaptation of the...

    • frontiersin.figshare.com
    • figshare.com
    docx
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cise Mis; Gokcen Kofali; Bethan Swift; Pinar Yalcin Bahat; Gamze Senocak; Bahar Taneri; Lone Hummelshoj; Stacey A. Missmer; Christian M. Becker; Krina T. Zondervan; Bahar Yuksel Ozgor; Engin Oral; Umit Inceboz; Mevhibe B. Hocaoglu; Nilufer Rahmioglu (2023). Data_Sheet_1_Protocol for the Cultural Translation and Adaptation of the World Endometriosis Research Foundation Endometriosis Phenome and Biobanking Harmonization Project Endometriosis Participant Questionnaire (EPHect).DOCX [Dataset]. http://doi.org/10.3389/fgwh.2021.644609.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    Frontiers
    Authors
    Cise Mis; Gokcen Kofali; Bethan Swift; Pinar Yalcin Bahat; Gamze Senocak; Bahar Taneri; Lone Hummelshoj; Stacey A. Missmer; Christian M. Becker; Krina T. Zondervan; Bahar Yuksel Ozgor; Engin Oral; Umit Inceboz; Mevhibe B. Hocaoglu; Nilufer Rahmioglu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Endometriosis affects 10% of women worldwide and is one of the most common causes of chronic pelvic pain and infertility. However, causal mechanisms of this disease remain unknown due to its heterogeneous presentation. In order to successfully study its phenotypic variation, large sample sizes are needed. Pooling of data across sites is not always feasible given the large variation in the complexity and quality of the data collected. The World Endometriosis Research Foundation (WERF) Endometriosis Phenome and Biobanking Harmonization Project (EPHect) have developed an endometriosis participant questionnaire (EPQ) to harmonize non-surgical clinical participant characteristic data relevant to endometriosis research, allowing for large-scale collaborations in English-speaking populations. Although the WERF EPHect EPQs have been translated into different languages, no study has examined the cross-cultural translation and adaptation for content and face validity. In order to investigate this, we followed the standard guidelines for cross-cultural adaptation and translation of the minimum version of the EPQ (EPQ-M) using 40 patients who underwent laparoscopic surgery in Turkey and 40 women in Northern Cyprus, aged between 18 and 55. We assessed the consistency by using cognitive testing and found the EPHect EPQ-M to be comprehensive, informative, and feasible in these two Turkish-speaking populations. The translated and adapted questionnaire was found to be epidemiologically robust, taking around 30–60 min to complete; furthermore, participants reported a similar understanding of the questions, showing that common perspectives were explored. Results from the cognitive testing process led to minor additions to some items such as further descriptive and/or visuals in order to clarify medical terminology. This paper illustrates the first successful cross-cultural translation and adaptation of the EPHect EPQ-M and should act as a tool to allow for further studies that wish to use this questionnaire in different languages. Standardized tools like this should be adopted by researchers worldwide to facilitate collaboration and aid in the design and conduction of global studies to ultimately help those affected by endometriosis and its associated symptoms.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nelson Johansen; Gerald Quon (2023). Additional file 3: of scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data [Dataset]. http://doi.org/10.6084/m9.figshare.9631709.v1
Organization logoOrganization logo

Additional file 3: of scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Jun 3, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Nelson Johansen; Gerald Quon
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Contains supplementary marker gene information. (XLS 117 kb)

Search
Clear search
Close search
Google apps
Main menu