50 datasets found

d
PanTool – software for data harmonization and conversion, Version 1
dataone.org
doi.pangaea.de
Updated Apr 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sieger, Rainer; Grobe, Hannes; Alfred Wegener Institute, Helmholtz Center for Polar and Marine Research, Bremerhaven (2018). PanTool – software for data harmonization and conversion, Version 1 [Dataset]. http://doi.org/10.1594/PANGAEA.510701
Explore at:
Unique identifier
https://doi.org/10.1594/PANGAEA.510701
Dataset updated
Apr 15, 2018
Dataset provided by
PANGAEA Data Publisher for Earth and Environmental Science
Authors
Sieger, Rainer; Grobe, Hannes; Alfred Wegener Institute, Helmholtz Center for Polar and Marine Research, Bremerhaven
Description
The program PanTool was developed as a tool box like a Swiss Army Knife for data conversion and recalculation, written to harmonize individual data collections to standard import format used by PANGAEA. The format of input files the program PanTool needs is a tabular saved in plain ASCII. The user can create this files with a spread sheet program like MS-Excel or with the system text editor. PanTool is distributed as freeware for the operating systems Microsoft Windows, Apple OS X and Linux.
Harp: Data Harmonization for Computational Tissue Deconvolution across...
zenodo.org
bin
Updated Jun 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zahra Nozari; Paul Hüttl; Paul Hüttl; Jakob Simeth; Jakob Simeth; Marian Schön; James A. Hutchinson; Rainer Spang; Rainer Spang; Zahra Nozari; Marian Schön; James A. Hutchinson (2025). Harp: Data Harmonization for Computational Tissue Deconvolution across Diverse Transcriptomics Platforms [Dataset]. http://doi.org/10.5281/zenodo.15650057
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15650057
Dataset updated
Jun 13, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zahra Nozari; Paul Hüttl; Paul Hüttl; Jakob Simeth; Jakob Simeth; Marian Schön; James A. Hutchinson; Rainer Spang; Rainer Spang; Zahra Nozari; Marian Schön; James A. Hutchinson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 26, 2025
Description
Harp is a tool that estimates reference profiles and cell compositions for deconvolution of bulk transcriptomic data.

For evaluation the performance of Harp against other deconvolution tools we employed real bulk expression data (RNA-seq and microarray), along with their corresponding cell compositions from flow cytometry experiment as well as cell expression profiles measured through sorted RNA-seq and microarray technology.

These datasets contain combined processed RNA-seq, flow cytometry and microarray expression data that were utilized in the Harplication package, which applies the Harp algorithm along other deconvolution tools.

The original datasets are derived from the following studies:

Bulk RNA-seq expression with paired flow cytometry from Zimmermann et al., 2016. The datasets were received via SDY67.

Sorted RNA-seq expression data from Monaco et al., 2019 , available on NCBI under GEO accession number GSE107011.

Microarray gene expression data from Newman et al., 2015, available on NCBI under GEO accession number GSE65133.

Flow cytometry data from Newman et al., 2015, was received from the analysis of Vallania et al., 2018, also available on GSE65133.

Microarray based reference from Newman et al., 2015, available on CIBERSORTx.

This project has received funding from

The European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 860003.

Bundesministerium für Bildung und Forschung (BMBF, German Federal Ministry of Education and Research) [031L0173].
f
Description and harmonization strategy for the predictor variables.
figshare.com
xlsx
Updated Apr 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan (2025). Description and harmonization strategy for the predictor variables. [Dataset]. http://doi.org/10.1371/journal.pone.0309572.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0309572.s001
Dataset updated
Apr 23, 2025
Dataset provided by
PLOS ONE
Authors
Xin Wu; Jeran Stratford; Karen Kesler; Cataia Ives; Tabitha Hendershot; Barbara Kroner; Ying Qin; Huaqin Pan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description and harmonization strategy for the predictor variables.
Additional file 3: of scAlign: a tool for alignment, integration, and rare...
springernature.figshare.com
xls
Updated Jun 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nelson Johansen; Gerald Quon (2023). Additional file 3: of scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data [Dataset]. http://doi.org/10.6084/m9.figshare.9631709.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9631709.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Nelson Johansen; Gerald Quon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Contains supplementary marker gene information. (XLS 117 kb)
o
Evaluation of item matching strategies to harmonize assessment tools for...
osf.io
Updated Mar 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mauricio Scopel Hoffmann; Tyler Moore; Michael Milham; Theodore Sattherwaite; Giovanni Salum (2022). Evaluation of item matching strategies to harmonize assessment tools for psychopathology in children and adolescents [Dataset]. http://doi.org/10.17605/OSF.IO/WNRP4
Explore at:
Unique identifier
https://doi.org/10.17605/OSF.IO/WNRP4
Dataset updated
Mar 29, 2022
Dataset provided by
Center For Open Science
Authors
Mauricio Scopel Hoffmann; Tyler Moore; Michael Milham; Theodore Sattherwaite; Giovanni Salum
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Reproducible Brain Charts initiative aims to aggregate and harmonize phenotypic and neuroimage data to delineate novel mechanisms regarding the developmental basis of psychopathology in youth and yield reproducible growth charts of brain development. To reach this objective, the second step of our project is to test item-wise matching strategies of phenotypic harmonization between studies using bifactor models of psychopathology. We focused on this model because general and specific aspects of mental health problems can dissociated, so more specific relationships with the brain could be established. In the current study, we benchmarked six item matching strategies for harmonizing the Child Behavioral Checklist (CBCL) and the Sstrenghts and Difficulties Qquestionnaire (SDQ) within a bifactor model framework in two samples that were assessed with both instruments. It proceded in the following steps: 1) harmonization of items according to the six strategies, 2) estimated bifactor models with harmonized items for each sample separately, 3) estimated factor score correlation between assessment tools in each sample, 4) estimated factor reliability, 5) tested the assessment’s invariance according to each strategy and 6) calculated the root expected mean square difference (REMSD) to estimate the factor score difference of using a proxy measure instead of a target measure while integrating the two samples. We expect that the results of this study can encourage the use of the best streategy to date to increase reproducibility in the field while aggregating data from different contexts and instruments in the context of the bifactor model of psychopathology.
f
Table_1_Streamlining intersectoral provision of real-world health data: a...
frontiersin.figshare.com
xlsx
Updated Jun 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katja Hoffmann; Igor Nesterow; Yuan Peng; Elisa Henke; Daniela Barnett; Cigdem Klengel; Mirko Gruhl; Martin Bartos; Frank Nüßler; Richard Gebler; Sophia Grummt; Anne Seim; Franziska Bathelt; Ines Reinecke; Markus Wolfien; Jens Weidner; Martin Sedlmayr (2024). Table_1_Streamlining intersectoral provision of real-world health data: a service platform for improved clinical research and patient care.XLSX [Dataset]. http://doi.org/10.3389/fmed.2024.1377209.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fmed.2024.1377209.s001
Dataset updated
Jun 5, 2024
Dataset provided by
Frontiers
Authors
Katja Hoffmann; Igor Nesterow; Yuan Peng; Elisa Henke; Daniela Barnett; Cigdem Klengel; Mirko Gruhl; Martin Bartos; Frank Nüßler; Richard Gebler; Sophia Grummt; Anne Seim; Franziska Bathelt; Ines Reinecke; Markus Wolfien; Jens Weidner; Martin Sedlmayr
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionObtaining real-world data from routine clinical care is of growing interest for scientific research and personalized medicine. Despite the abundance of medical data across various facilities — including hospitals, outpatient clinics, and physician practices — the intersectoral exchange of information remains largely hindered due to differences in data structure, content, and adherence to data protection regulations. In response to this challenge, the Medical Informatics Initiative (MII) was launched in Germany, focusing initially on university hospitals to foster the exchange and utilization of real-world data through the development of standardized methods and tools, including the creation of a common core dataset. Our aim, as part of the Medical Informatics Research Hub in Saxony (MiHUBx), is to extend the MII concepts to non-university healthcare providers in a more seamless manner to enable the exchange of real-world data among intersectoral medical sites.MethodsWe investigated what services are needed to facilitate the provision of harmonized real-world data for cross-site research. On this basis, we designed a Service Platform Prototype that hosts services for data harmonization, adhering to the globally recognized Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) international standard communication format and the Observational Medical Outcomes Partnership (OMOP) common data model (CDM). Leveraging these standards, we implemented additional services facilitating data utilization, exchange and analysis. Throughout the development phase, we collaborated with an interdisciplinary team of experts from the fields of system administration, software engineering and technology acceptance to ensure that the solution is sustainable and reusable in the long term.ResultsWe have developed the pre-built packages “ResearchData-to-FHIR,” “FHIR-to-OMOP,” and “Addons,” which provide the services for data harmonization and provision of project-related real-world data in both the FHIR MII Core dataset format (CDS) and the OMOP CDM format as well as utilization and a Service Platform Prototype to streamline data management and use.ConclusionOur development shows a possible approach to extend the MII concepts to non-university healthcare providers to enable cross-site research on real-world data. Our Service Platform Prototype can thus pave the way for intersectoral data sharing, federated analysis, and provision of SMART-on-FHIR applications to support clinical decision making.
E
Data from: Integration and harmonization of trait data from plant...
live.european-language-grid.eu
zenodo.org
+1more
csv
Updated Dec 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Data from: Integration and harmonization of trait data from plant individuals across heterogeneous sources [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/7662
Explore at:
csvAvailable download formats
Dataset updated
Dec 13, 2023
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Trait data represent the basis for ecological and evolutionary research and have relevance for biodiversity conservation, ecosystem management and earth system modelling. The collection and mobilization of trait data has strongly increased over the last decade, but many trait databases still provide only species-level, aggregated trait values (e.g. ranges, means) and lack the direct observations on which those data are based. Thus, the vast majority of trait data measured directly from individuals remains hidden and highly heterogeneous, impeding their discoverability, semantic interoperability, digital accessibility and (re-)use. Here, we integrate quantitative measurements of verbatim trait information from plant individuals (e.g. lengths, widths, counts and angles of stems, leaves, fruits and inflorescence parts) from multiple sources such as field observations and herbarium collections. We develop a workflow to harmonize heterogeneous trait measurements (e.g. trait names and their values and units) as well as additional information related to taxonomy, measurement or fact and occurrence. This data integration and harmonization builds on vocabularies and terminology from existing metadata standards and ontologies such as the Ecological Trait-data Standard (ETS), the Darwin Core (DwC), the Thesaurus Of Plant characteristics (TOP) and the Plant Trait Ontology (TO). A metadata form filled out by data providers enables the automated integration of trait information from heterogeneous datasets. We illustrate our tools with data from palms (family Arecaceae), a globally distributed (pantropical), diverse plant family that is considered a good model system for understanding the ecology and evolution of tropical rainforests. We mobilize nearly 140,000 individual palm trait measurements in an interoperable format, identify semantic gaps in existing plant trait terminology and provide suggestions for the future development of a thesaurus of plant characteristics. Our work thereby promotes the semantic integration of plant trait data in a machine-readable way and shows how large amounts of small trait data sets and their metadata can be integrated into standardized data products.
t
Data from: Harmonizing oer metadata in etl processes with skohub in the...
service.tib.eu
Updated May 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Harmonizing oer metadata in etl processes with skohub in the project “wirlernenonline” [Dataset]. https://service.tib.eu/ldmservice/dataset/goe-doi-10-25625-8mzswb
Explore at:
Dataset updated
May 16, 2025
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The metadata for Open Educational Resources (OER) are often made available in repositories without recourse to uniform value lists and corresponding standards for their attributes. This circumstance complicates data harmonization when OERs from different sources are to be merged in one search environment. With the help of the RDF standard SKOS and the tool SkoHub-Vocabs, the project "WirLernenOnline" has found an innovative, reusable and standards-based solution to this challenge. This involves the creation of SKOS vocabularies that are used during the ETL process to standardize different terms (for example, "math" and "mathematics"). This then forms the basis for providing users with consistent filtering options and a good search experience. The created and open licensed vocabularies can then easily be reused and linked to overcome this challenge in the future.
o
Data from: HarDWR - Harmonized Water Rights Records
osti.gov
Updated Oct 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caccese, Robert; Fisher-Vanden, Karen; Fowler, Lara; Grogan, Danielle; Lammers, Richard; Lisk, Matthew; Olmstead, Sheila; Peklak, Darrah; Zheng, Jiameng; Zuidema, Shan (2024). HarDWR - Harmonized Water Rights Records [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/2475306
Explore at:
Dataset updated
Oct 31, 2024
Dataset provided by
USDOE Office of Science (SC), Biological and Environmental Research (BER)
MultiSector Dynamics - Living, Intuitive, Value-adding, Environment
Authors
Caccese, Robert; Fisher-Vanden, Karen; Fowler, Lara; Grogan, Danielle; Lammers, Richard; Lisk, Matthew; Olmstead, Sheila; Peklak, Darrah; Zheng, Jiameng; Zuidema, Shan
Description
A dataset within the Harmonized Database of Western U.S. Water Rights (HarDWR). For a detailed description of the database, please see the meta-record v2.0. Changelog v2.0 - Recalculated based on data sourced from WestDAAT - Changed using a Site ID column to identify unique records to using aa combination of Site ID and Allocation ID - Removed the Water Management Area (WMA) column from the harmonized records. The replacement is a separate file which stores the relationship between allocations and WMAs. This allows for allocations to contribute to water right amounts to multiple WMAs during the subsequent cumulative process. - Added a column describing a water rights legal status - Added "Unspecified" was a water source category - Added an acre-foot (AF) column - Added a column for the classification of the right's owner v1.02 - Added a .RData file to the dataset as a convenience for anyone exploring our code. This is an internal file, and the one referenced in analysis scripts as the data objects are already in R data objects. v1.01 - Updated the names of each file with an ID number less than 3 digits to include leading 0s v1.0 - Initial public release Description Heremore » we present an updated database of Western U.S. water right records. This database provides consistent unique identifiers for each water right record, and a consistent categorization scheme that puts each water right record into one of seven broad use categories. These data were instrumental in conducting a study of the multi-sector dynamics of inter-sectoral water allocation changes though water markets (Grogan et al., in review). Specifically, the data were formatted for use as input to a process-based hydrologic model, Water Balance Model (WBM), with a water rights module (Grogan et al., in review). While this specific study motivated the development of the database presented here, water management in the U.S. West is a rich area of study (e.g., Anderson and Woosly, 2005; Tidwell, 2014; Null and Prudencio, 2016; Carney et al., 2021) so releasing this database publicly with documentation and usage notes will enable other researchers to do further work on water management in the U.S. West. We produced the water rights database presented here in four main steps: (1) data collection, (2) data quality control, (3) data harmonization, and (4) generation of cumulative water rights curves. Each of steps (1)-(3) had to be completed in order to produce (4), the final product that was used in the modeling exercise in Grogan et al. (in review). All data in each step is associated with a spatial unit called a Water Management Area (WMA), which is the unit of water right administration utilized by the state in which the right came from. Steps (2) and (3) required use to make assumptions and interpretation, and to remove records from the raw data collection. We describe each of these assumptions and interpretations below so that other researchers can choose to implement alternative assumptions an interpretation as fits their research aims. Motivation for Changing Data Sources The most significant change has been a switch from collecting the raw water rights directly from each state to using the water rights records presented in WestDAAT, a product of the Water Data Exchange (WaDE) Program under the Western States Water Council (WSWC). One of the main reasons for this is that each state of interest is a member of the WSWC, meaning that WaDE is partially funded by these states, as well as many universities. As WestDAAT is also a database with consistent categorization, it has allowed us to spend less time on data collection and quality control and more time on answering research questions. This has included records from water right sources we had previously not known about when creating v1.0 of this database. The only major downside to utilizing the WestDAAT records as our raw data is that further updates are tied to when WestDAAT is updated, as some states update their public water right records daily. However, as our focus is on cumulative water amounts at the regional scale, it is unlikely most records updates would have a significant effect on our results. The structure of WestDAAT led to several important changes to how HarWR is formatted. The most significant change is that WaDE has calculated a field known as SiteUUID, which is a unique identifier for the Point of Diversion (POD), or where the water is drawn from. This separate from AllocationNativeID, which is the identifier for the allocation of water, or the amount of water associated with the water right. It should be noted that it is possible for a single site to have multiple allocations associated with it and for an allocation to be able to be extracted from multiple sites. The site-allocation structure has allowed us to adapt a more consistent, and hopefully more realistic, approach in organizing the water right records than we had with HarDWR v1.0. This was incredibly helpful as the raw data from many states had multiple water uses within a single field within a single row of their raw data, and it was not always clear if the first water use was the most important, or simply first alphabetically. WestDAAT has already addressed this data quality issue. Furthermore, with v1.0, when there were multiple records with the same water right ID, we selected the largest volume or flow amount and disregarded the rest. As WestDAAT was already a common structure for disparate data formats, we were better able to identify sites with multiple allocations and, perhaps more importantly, allocations with multiple sites. This is particularly helpful when an allocation has sites which cross WMA boundaries, instead of just assigning the full water amount to a single WMA we are now able to divide the amount of water between the number of relevant WMAs. As it is now possible to identify allocations with water used in multiple WMAs, it is no longer practical to store this information within a single column. Instead the stAllocationToWMATab.csv file was created, which is an allocation by WMA matrix containing the percent Place of Use area overlap with each WMA. We then use this percentage to divide the allocation's flow amount between the given WMAs during the cumulation process to hopefully provide more realistic totals of water use in each area. However, not every state provides areas of water use, so like HarDWR v1.0, a hierarchical decision tree was used to assign each allocation to a WMA. First, if a WMA could be identified based on the allocation ID, then that WMA was used; typically, when available, this applied to the entire state and no further steps were needed. Second was the spatial analysis of Place of Use to WMAs. Third was a spatial analysis of the POD locations to WMAs, with the assumption that allocation's POD is within the WMA it should belong to; if an allocation still had multiple WMAs based on its POD locations, then the allocation's flow amount would be divided equally between all WMAs. The fourth, and final, process was to include water allocations which spatially fell outside of the state WMA boundaries. This could be due to several reasons, such as coordinate errors / imprecision in the POD location, imprecision in the WMA boundaries, or rights attached with features, such as a reservoir, which crosses state boundaries. To include these records, we decided for any POD which was within one kilometer of the state's edge would be assigned to the nearest WMA. Other Changes WestDAAT has Allowed In addition to a more nuanced and consistent method of assigning water right's data to WMAs, there are other benefits gained from using the WestDAAT dataset. Among those is a consistent categorization of a water right's legal status. In HarDWR v1.0, legal status was effectively ignored, which led to many valid concerns about the quality of the database related to the amounts of water the rights allowed to be claimed. The main issue was that rights with legal status' such as "application withdrawn", "non-active", or "cancelled" were included within HarDWR v1.0. These, and other water rights status' which were deemed to not be in use have been removed from this version of the database. Another major change has been the addition of the "unspecified water source category. This is water that can come from either surface water or groundwater, or the source of which is unknown. The addition of this source category brings the total number of categories to three. Due to reviewer feedback, we decided to add the acre-foot (AF) column so that the data may be more applicable to a wider audience. We added the ownerClassification column so that the data may be more applicable to a wider audience. File Descriptions The dataset is a series of various files organized by state sub-directories. In addition, each file begins with the state's name, in case the file is separate from its sub-directory for some reason. After the state name is the text which describes the contents of the file. Here is each file described in detail. Note that st is a placeholder for the state's name. stFullRecords_HarmonizedRights.csv: A file of the complete water records for each state. The column headers for each of this type of file are: state - The name of the state to which the allocations belong to. FIPS - The two digit numeric state ID code. siteID - The site location ID for POD locations. A site may have multiple allocations, which are the actual amount of water which can be drawn. In a simplified hypothetical, a farm stead may have an allocation for "irrigation" and an allocation for "domestic" water use, but the water is drawn from the same pumping equipment. It should be noted that many of the site ID appear to have been added by WaDE, and therefore may not be recognized by a given state's water rights database. allocationID - The allocation ID for the water right. For most states this is the water right ID, and what is
GWAS summary statistics imputation support data and integration with...
zenodo.org
explore.openaire.eu
tar
Updated Feb 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alvaro Numa Barbeira; Alvaro Numa Barbeira; Hae Kyung Im; Hae Kyung Im (2020). GWAS summary statistics imputation support data and integration with PrediXcan MASHR [Dataset]. http://doi.org/10.5281/zenodo.3569954
Explore at:
tarAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3569954
Dataset updated
Feb 7, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alvaro Numa Barbeira; Alvaro Numa Barbeira; Hae Kyung Im; Hae Kyung Im
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
# GWAS summary statistics imputation, integration with PrediXcan MASHR-M

The file `sample_data.tar` contains all necessary files to perform imputation of GWAS summary statistics to the GTEx v8 QTL data set.

It includes 1000 Genomes individuals' genotypes as reference panel.

The `.tar` archive, upon uncompression, contains the following:

```

data/

├── eur_ld.bed.gz
├── gtex_v8_eur_filtered_maf0.01_monoallelic_variants.txt.gz

├── coordinate_map
├── gwas
├── liftover
├── models
│ ├── eqtl
│ │ └── mashr
│ └── sqtl
│ └── mashr
└── reference_panel_1000G

```

`data/eur_ld.bed.gz` contains definitions of approximately independent LD-regions in hg38 (Berisa-Pickrell regions, lifted over)

`data/gtex_v8_eur_filtered_maf0.01_monoallelic_variants.txt.gz` is a snp annotation file, listing all GTEx v8 variants with MAF>0.01 in europeans.

`data/coordinate_map` contains precomputed mapping tables that MetaXcan tools can use to convert GWAS' genomic coordinates in GWAS between genome assemblies.

`data/gwas` contains a sample GWAS file for the purposes of a tutorial (data obtained from Nikpay et al (Nat Gen 2016) https://www.ncbi.nlm.nih.gov/pubmed/26343387

`data/liftover` contains Liftover chains to map coordinates between human genome assemblies (used by full harmonization tools)

`data/models` contains PrediXcan MASHR-M models, and cross-tissue S-MultiXcan LD compilation, from eQTL and sQTL.

`data/reference_panel_1000G` contains 1000G hg38 genotypes, in parquet format, to be used by imputation tools.
H
Knowledge Management - Raw Source Data
datasetcatalog.nlm.nih.gov
dataverse.harvard.edu
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anez, Diomar; Anez, Dimar (2025). Knowledge Management - Raw Source Data [Dataset]. http://doi.org/10.7910/DVN/8ATSMJ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/8ATSMJ
Dataset updated
May 6, 2025
Authors
Anez, Diomar; Anez, Dimar
Description
This dataset contains raw, unprocessed data files pertaining to the management tool 'Knowledge Management' (KM), including related concepts like Intellectual Capital Management and Knowledge Transfer. The data originates from five distinct sources, each reflecting different facets of the tool's prominence and usage over time. Files preserve the original metrics and temporal granularity before any comparative normalization or harmonization. Data Sources & File Details: Google Trends File (Prefix: GT_): Metric: Relative Search Interest (RSI) Index (0-100 scale). Keywords Used: "knowledge management" + "knowledge management organizational" Time Period: January 2004 - January 2025 (Native Monthly Resolution). Scope: Global Web Search, broad categorization. Extraction Date: Data extracted January 2025. Notes: Index relative to peak interest within the period for these terms. Reflects public/professional search interest trends. Based on probabilistic sampling. Source URL: Google Trends Query Google Books Ngram Viewer File (Prefix: GB_): Metric: Annual Relative Frequency (% of total n-grams in the corpus). Keywords Used: Knowledge Management + Intellectual Capital Management + Knowledge Transfer Time Period: 1950 - 2022 (Annual Resolution). Corpus: English. Parameters: Case Insensitive OFF, Smoothing 0. Extraction Date: Data extracted January 2025. Notes: Reflects term usage frequency in Google's digitized book corpus. Subject to corpus limitations (English bias, coverage). Source URL: Ngram Viewer Query Crossref.org File (Prefix: CR_): Metric: Absolute count of publications per month matching keywords. Keywords Used: ("knowledge management" OR "intellectual capital management" OR "knowledge transfer") AND ("organizational" OR "management" OR "learning" OR "innovation" OR "sharing" OR "system") Time Period: 1950 - 2025 (Queried for monthly counts based on publication date metadata). Search Fields: Title, Abstract. Extraction Date: Data extracted January 2025. Notes: Reflects volume of relevant academic publications indexed by Crossref. Deduplicated using DOIs; records without DOIs omitted. Source URL: Crossref Search Query Bain & Co. Survey - Usability File (Prefix: BU_): Metric: Original Percentage (%) of executives reporting tool usage. Tool Names/Years Included: Knowledge Management (1999, 2000, 2002, 2004, 2006, 2008, 2010). Respondent Profile: CEOs, CFOs, COOs, other senior leaders; global, multi-sector. Source: Bain & Company Management Tools & Trends publications (Rigby D., Bilodeau B., et al., various years: 2001, 2003, 2005, 2007, 2009, 2011). Note: Tool potentially not surveyed or reported after 2010 under this specific name. Data Compilation Period: July 2024 - January 2025. Notes: Data points correspond to specific survey years. Sample sizes: 1999/475; 2000/214; 2002/708; 2004/960; 2006/1221; 2008/1430; 2010/1230. Bain & Co. Survey - Satisfaction File (Prefix: BS_): Metric: Original Average Satisfaction Score (Scale 0-5). Tool Names/Years Included: Knowledge Management (1999, 2000, 2002, 2004, 2006, 2008, 2010). Respondent Profile: CEOs, CFOs, COOs, other senior leaders; global, multi-sector. Source: Bain & Company Management Tools & Trends publications (Rigby D., Bilodeau B., et al., various years: 2001, 2003, 2005, 2007, 2009, 2011). Note: Tool potentially not surveyed or reported after 2010 under this specific name. Data Compilation Period: July 2024 - January 2025. Notes: Data points correspond to specific survey years. Sample sizes: 1999/475; 2000/214; 2002/708; 2004/960; 2006/1221; 2008/1430; 2010/1230. Reflects subjective executive perception of utility. File Naming Convention: Files generally follow the pattern: PREFIX_Tool.csv, where the PREFIX indicates the data source: GT_: Google Trends GB_: Google Books Ngram CR_: Crossref.org (Count Data for this Raw Dataset) BU_: Bain & Company Survey (Usability) BS_: Bain & Company Survey (Satisfaction) The essential identification comes from the PREFIX and the Tool Name segment. This dataset resides within the 'Management Tool Source Data (Raw Extracts)' Dataverse.
E
Fundamental Data Record for Atmospheric Composition [ATMOS_L1B]
eocat.esa.int
fedeo.ceos.org
Updated Jun 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ESA/ESRIN (2025). Fundamental Data Record for Atmospheric Composition [ATMOS_L1B] [Dataset]. http://doi.org/10.5270/ESA-852456e
Explore at:
application/x-binaryAvailable download formats
Unique identifier
https://doi.org/10.5270/ESA-852456e
Dataset updated
Jun 27, 2025
Dataset provided by
ESA/ESRIN
License
https://hm-atmos-ds.eo.esa.int/oads/access/collectionhttps://hm-atmos-ds.eo.esa.int/oads/access/collection
https://hm-atmos-ds.eo.esa.int/oads/access/collection/FDR_for_Atmospheric_Composition_ATMOS_L1Bhttps://hm-atmos-ds.eo.esa.int/oads/access/collection/FDR_for_Atmospheric_Composition_ATMOS_L1B
Time period covered
Jun 28, 1995 - Apr 7, 2012
Area covered
Earth
Measurement technique
Spectrometers
Description
The Fundamental Data Record (FDR) for Atmospheric Composition UVN Level 1b v.1.0 dataset is a cross-instrument Level-1 product [ATMOS_L1B] generated in 2023 and resulting from the _$ESA FDR4ATMOS project$ https://atmos.eoc.dlr.de/FDR4ATMOS/ . The FDR contains selected Earth Observation Level 1b parameters (irradiance/reflectance) from the nadir-looking measurements of the ERS-2 GOME and Envisat SCIAMACHY missions for the period ranging from 1995 to 2012. The data record offers harmonised cross-calibrated spectra, essential for subsequent trace gas retrieval. The focus lies on spectral windows in the Ultraviolet-Visible-Near Infrared regions the retrieval of critical atmospheric constituents like ozone (O3), sulphur dioxide (SO2), nitrogen dioxide (NO2) column densities, alongside cloud parameters in the NIR spectrum. For many aspects, the FDR product has improved compared to the existing individual mission datasets: • GOME solar irradiances are harmonised using a validated SCIAMACHY solar reference spectrum, solving the problem of the fast-changing etalon present in the original GOME Level 1b data; • Reflectances for both GOME and SCIAMACHY are provided in the FDR product. GOME reflectances are harmonised to degradation-corrected SCIAMACHY values, using collocated data from the CEOS PIC sites; • SCIAMACHY data are scaled to the lowest integration time within the spectral band using high-frequency PMD measurements from the same wavelength range. This simplifies the use of the SCIAMACHY spectra which were split in a complex cluster structure (with own integration time) in the original Level 1b data; • The harmonization process applied mitigates the viewing angle dependency observed in the UV spectral region for GOME data; • Uncertainties are provided.

Each FDR product covers three FDRs (irradiance/reflectance for UV-VIS-NIR) for a single day within the same product including information from the individual ERS-2 GOME and Envisat SCIAMACHY orbits therein.

FDR has been generated in two formats: Level 1A and Level 1B targeting expert users and nominal applications respectively. The Level 1A [ATMOS_L1A] data include additional parameters such as harmonisation factors, PMD, and polarisation data extracted from the original mission Level 1 products. The ATMOS_L1A dataset is not part of the nominal dissemination to users. In case of specific requirements, please contact _$EOHelp$ http://esatellus.service-now.com/csp?id=esa_simple_request&sys_id=f27b38f9dbdffe40e3cedb11ce961958 .

The FDR4ATMOS products should be regarded as experimental due to the innovative approach and the current use of a limited-sized test dataset to investigate the impact of harmonization on the Level 2 target species, specifically SO2, O3 and NO2. Presently, this analysis is being carried out within follow-on activities.

One of the main aspects of the project was the characterization of Level 1 uncertainties for both instruments, based on metrological best practices. The following documents are provided:

General guidance on a metrological approach to Fundamental Data Records (FDR) -> link TBC

Uncertainty Characterisation document -> link TBC

Effect tables -> link TBC

NetCDF files containing example uncertainty propagation analysis and spectral error correlation matrices for SCIAMACHY (Atlantic and Mauretania scene for 2003 and 2010) and GOME (Atlantic scene for 2003) links TBC reflectance_uncertainty_example_FDR4ATMOS_GOME.nc reflectance_uncertainty_example_FDR4ATMOS_SCIA.nc

The FDR V1 is currently being extended to include the MetOp GOME-2 series.

All the new products are conveniently formatted in NetCDF. Free standard tools, such as _$Panoply$ https://www.giss.nasa.gov/tools/panoply/ , can be used to read NetCDF data.

Panoply is sourced and updated by external entities. For further details, please consult our _$Terms and Conditions page$ https://earth.esa.int/eogateway/terms-and-conditions .
T
Trade Promotion Management and Optimization for the Consumer Goods Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Trade Promotion Management and Optimization for the Consumer Goods Report [Dataset]. https://www.marketresearchforecast.com/reports/trade-promotion-management-and-optimization-for-the-consumer-goods-44435
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Mar 21, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Trade Promotion Management (TPM) and Optimization market within the consumer goods sector is experiencing robust growth, projected to reach $618.4 million in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 7.9% from 2025 to 2033. This expansion is fueled by several key drivers. The increasing complexity of modern retail channels, encompassing both brick-and-mortar and e-commerce, necessitates sophisticated TPM solutions to optimize promotional campaigns and maximize ROI. Data-driven decision-making is becoming paramount, with companies increasingly leveraging advanced analytics and data harmonization capabilities to understand consumer behavior and refine their promotional strategies. Furthermore, the rise of omnichannel strategies demands integrated TPM systems capable of managing promotions across all touchpoints, ensuring consistent messaging and maximizing customer engagement. The Food and Beverage sector, particularly within retail and e-commerce, is a significant driver of market growth, reflecting the intense competition and need for effective promotional tools in these saturated markets. Companies are actively seeking TPM solutions to improve forecast accuracy, streamline operations, and ultimately enhance profitability. The market segmentation highlights the diverse applications of TPM solutions. Data harmonization is crucial for integrating data from disparate sources, providing a holistic view of promotional performance. Order management functionalities streamline the process of handling promotional orders and optimizing inventory levels. Head office planning tools enable centralized control and strategic decision-making regarding promotional activities. While the Food and Beverage sector dominates, other sectors are also showing increasing adoption, reflecting the broad applicability of effective promotion management techniques. The competitive landscape is dynamic, with established players like SAP and Oracle competing alongside specialized providers like Blueshift and IRI Worldwide, indicating a healthy ecosystem of innovation and competition. Geographic expansion is also driving growth, with North America and Europe currently holding significant market share, while Asia-Pacific presents a high-growth potential market due to increasing retail modernization and consumer spending.
Data from: Assay Harmonization Study To Measure Immune Response to...
data.niaid.nih.gov
url
Updated Jan 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ligia Pinto (2025). Assay Harmonization Study To Measure Immune Response to SARS-CoV-2 Infection and Vaccines: a Serology Methods Study [Dataset]. http://doi.org/10.21430/M3OL8R66OB
Explore at:
urlAvailable download formats
Unique identifier
https://doi.org/10.21430/M3OL8R66OB
Dataset updated
Jan 30, 2025
Dataset provided by
National Institute of Allergy and Infectious Diseaseshttp://www.niaid.nih.gov/
Authors
Ligia Pinto
License
https://www.immport.org/agreementhttps://www.immport.org/agreement
Description
The Coronavirus disease 2019 (COVID-19) pandemic presented the scientific community with an immediate need for accurate severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) serology assays, resulting in an expansion of assay development, some without following a rigorous quality control and validation, and with a wide range of performance characteristics. Vast amounts of data have been gathered on SARS-CoV-2 antibody response; however, performance and ability to compare the results have been challenging. This study seeks to analyze the reliability, sensitivity, specificity, and reproducibility of a set of widely used commercial, in-house, and neutralization serology assays, as well as provide evidence for the feasibility of using the World Health Organization (WHO) International Standard (IS) as a harmonization tool. This study also seeks to demonstrate that binding immunoassays may serve as a practical alternative for the serological study of large sample sets in lieu of expensive, complex, and less reproducible neutralization assays. In this study, commercial assays demonstrated the highest specificity, while in-house assays excelled in antibody sensitivity. As expected, neutralization assays demonstrated high levels of variability but overall good correlations with binding immunoassays, suggesting that binding may be reasonably accurate as well as practical for the study of SARS-CoV-2 serology. All three assay types performed well after WHO IS standardization. The results of this study demonstrate there are high performing serology assays available to the scientific community to rigorously dissect antibody responses to infection and vaccination. IMPORTANCE Previous studies have shown significant variability in SARS-CoV-2 antibody serology assays, highlighting the need for evaluation and comparison of these assays using the same set of samples covering a wide range of antibody responses induced by infection or vaccination. This study demonstrated that there are high performing assays that can be used reliably to evaluate immune responses to SARS-CoV-2 in the context of infection and vaccination. This study also demonstrated the feasibility of harmonizing these assays against the International Standard and provided evidence that the binding immunoassays may have high enough correlation with the neutralization assays to serve as a practical proxy. These results represent an important step in standardizing and harmonizing the many different serological assays used to evaluate COVID-19 immune responses in the population.
N
National Institute of Mental Health Data Archive
nda.nih.gov
Updated Sep 13, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (2019). National Institute of Mental Health Data Archive [Dataset]. https://nda.nih.gov
Explore at:
Dataset updated
Sep 13, 2019
Dataset provided by
National Institutes of Health
Description
The National Institute of Mental Health Data Archive (NDA) makes available human subjects data collected from hundreds of research projects across many scientific domains. NDA provides infrastructure for sharing research data, tools, methods, and analyses enabling collaborative science and discovery. De-identified human subjects data, harmonized to a common standard, are available to qualified researchers. Summary data are available to all.

The NDA mission is to accelerate scientific research and discovery through data sharing, data harmonization, and the reporting of research results.
H
Customer Segmentation - Raw Source Data
dataverse.harvard.edu
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diomar Anez; Dimar Anez (2025). Customer Segmentation - Raw Source Data [Dataset]. http://doi.org/10.7910/DVN/0NS2KB
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/0NS2KB
Dataset updated
May 6, 2025
Dataset provided by
Harvard Dataverse
Authors
Diomar Anez; Dimar Anez
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains raw, unprocessed data files pertaining to the management tool 'Customer Segmentation', including the closely related concept of Market Segmentation. The data originates from five distinct sources, each reflecting different facets of the tool's prominence and usage over time. Files preserve the original metrics and temporal granularity before any comparative normalization or harmonization. Data Sources & File Details: Google Trends File (Prefix: GT_): Metric: Relative Search Interest (RSI) Index (0-100 scale). Keywords Used: "customer segmentation" + "market segmentation" + "customer segmentation marketing" Time Period: January 2004 - January 2025 (Native Monthly Resolution). Scope: Global Web Search, broad categorization. Extraction Date: Data extracted January 2025. Notes: Index relative to peak interest within the period for these terms. Reflects public/professional search interest trends. Based on probabilistic sampling. Source URL: Google Trends Query Google Books Ngram Viewer File (Prefix: GB_): Metric: Annual Relative Frequency (% of total n-grams in the corpus). Keywords Used: Customer Segmentation + Market Segmentation Time Period: 1950 - 2022 (Annual Resolution). Corpus: English. Parameters: Case Insensitive OFF, Smoothing 0. Extraction Date: Data extracted January 2025. Notes: Reflects term usage frequency in Google's digitized book corpus. Subject to corpus limitations (English bias, coverage). Source URL: Ngram Viewer Query Crossref.org File (Prefix: CR_): Metric: Absolute count of publications per month matching keywords. Keywords Used: ("customer segmentation" OR "market segmentation") AND ("marketing" OR "strategy" OR "management" OR "targeting" OR "analysis" OR "approach" OR "practice") Time Period: 1950 - 2025 (Queried for monthly counts based on publication date metadata). Search Fields: Title, Abstract. Extraction Date: Data extracted January 2025. Notes: Reflects volume of relevant academic publications indexed by Crossref. Deduplicated using DOIs; records without DOIs omitted. Source URL: Crossref Search Query Bain & Co. Survey - Usability File (Prefix: BU_): Metric: Original Percentage (%) of executives reporting tool usage. Tool Names/Years Included: Customer Segmentation (1999, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2017). Respondent Profile: CEOs, CFOs, COOs, other senior leaders; global, multi-sector. Source: Bain & Company Management Tools & Trends publications (Rigby D., Bilodeau B., et al., various years: 2001, 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017). Note: Tool not included in the 2022 survey data. Data Compilation Period: July 2024 - January 2025. Notes: Data points correspond to specific survey years. Sample sizes: 1999/475; 2000/214; 2002/708; 2004/960; 2006/1221; 2008/1430; 2010/1230; 2012/1208; 2014/1067; 2017/1268. Bain & Co. Survey - Satisfaction File (Prefix: BS_): Metric: Original Average Satisfaction Score (Scale 0-5). Tool Names/Years Included: Customer Segmentation (1999, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2017). Respondent Profile: CEOs, CFOs, COOs, other senior leaders; global, multi-sector. Source: Bain & Company Management Tools & Trends publications (Rigby D., Bilodeau B., et al., various years: 2001, 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017). Note: Tool not included in the 2022 survey data. Data Compilation Period: July 2024 - January 2025. Notes: Data points correspond to specific survey years. Sample sizes: 1999/475; 2000/214; 2002/708; 2004/960; 2006/1221; 2008/1430; 2010/1230; 2012/1208; 2014/1067; 2017/1268. Reflects subjective executive perception of utility. File Naming Convention: Files generally follow the pattern: PREFIX_Tool.csv, where the PREFIX indicates the data source: GT_: Google Trends GB_: Google Books Ngram CR_: Crossref.org (Count Data for this Raw Dataset) BU_: Bain & Company Survey (Usability) BS_: Bain & Company Survey (Satisfaction) The essential identification comes from the PREFIX and the Tool Name segment. This dataset resides within the 'Management Tool Source Data (Raw Extracts)' Dataverse.
H
Scenario Planning - Raw Source Data
datasetcatalog.nlm.nih.gov
dataverse.harvard.edu
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anez, Diomar; Anez, Dimar (2025). Scenario Planning - Raw Source Data [Dataset]. http://doi.org/10.7910/DVN/PXRVDS
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/PXRVDS
Dataset updated
May 6, 2025
Authors
Anez, Diomar; Anez, Dimar
Description
This dataset contains raw, unprocessed data files pertaining to the management tool group 'Scenario Planning', including related concepts like Scenario Analysis and Contingency Planning. The data originates from five distinct sources, each reflecting different facets of the tool's prominence and usage over time. Files preserve the original metrics and temporal granularity before any comparative normalization or harmonization. Data Sources & File Details: Google Trends File (Prefix: GT_): Metric: Relative Search Interest (RSI) Index (0-100 scale). Keywords Used: "scenario planning" + "scenario analysis" + "contingency planning" + "scenario planning business" Time Period: January 2004 - January 2025 (Native Monthly Resolution). Scope: Global Web Search, broad categorization. Extraction Date: Data extracted January 2025. Notes: Index relative to peak interest within the period for these terms. Reflects public/professional search interest trends. Based on probabilistic sampling. Source URL: Google Trends Query Google Books Ngram Viewer File (Prefix: GB_): Metric: Annual Relative Frequency (% of total n-grams in the corpus). Keywords Used: Scenario Planning + Scenario Analysis + Contingency Planning + Scenario and Contingency Planning Time Period: 1950 - 2022 (Annual Resolution). Corpus: English. Parameters: Case Insensitive OFF, Smoothing 0. Extraction Date: Data extracted January 2025. Notes: Reflects term usage frequency in Google's digitized book corpus. Subject to corpus limitations (English bias, coverage). Source URL: Ngram Viewer Query Crossref.org File (Prefix: CR_): Metric: Absolute count of publications per month matching keywords. Keywords Used: ("scenario planning" OR "scenario analysis" OR "contingency planning" OR "scenario and contingency planning") AND ("management" OR "strategic" OR "business" OR "planning" OR "implementation" OR "approach" OR "framework") Time Period: 1950 - 2025 (Queried for monthly counts based on publication date metadata). Search Fields: Title, Abstract. Extraction Date: Data extracted January 2025. Notes: Reflects volume of relevant academic publications indexed by Crossref. Deduplicated using DOIs; records without DOIs omitted. Source URL: Crossref Search Query Bain & Co. Survey - Usability File (Prefix: BU_): Metric: Original Percentage (%) of executives reporting tool usage. Tool Names/Years Included: Scenario Planning (1993, 1999, 2000); Scenario and Contingency Planning (2004, 2006, 2008, 2010, 2012, 2014, 2017); Scenario Analysis and Contingency Planning (2022). Respondent Profile: CEOs, CFOs, COOs, other senior leaders; global, multi-sector. Source: Bain & Company Management Tools & Trends publications (Rigby D., Bilodeau B., Ronan C. et al., various years: 1994, 2001, 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017, 2023). Data Compilation Period: July 2024 - January 2025. Notes: Data points correspond to specific survey years. Sample sizes: 1993/500; 1999/475; 2000/214; 2004/960; 2006/1221; 2008/1430; 2010/1230; 2012/1208; 2014/1067; 2017/1268; 2022/1068. Bain & Co. Survey - Satisfaction File (Prefix: BS_): Metric: Original Average Satisfaction Score (Scale 0-5). Tool Names/Years Included: Scenario Planning (1993, 1999, 2000); Scenario and Contingency Planning (2004, 2006, 2008, 2010, 2012, 2014, 2017); Scenario Analysis and Contingency Planning (2022). Respondent Profile: CEOs, CFOs, COOs, other senior leaders; global, multi-sector. Source: Bain & Company Management Tools & Trends publications (Rigby D., Bilodeau B., Ronan C. et al., various years: 1994, 2001, 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017, 2023). Data Compilation Period: July 2024 - January 2025. Notes: Data points correspond to specific survey years. Sample sizes: 1993/500; 1999/475; 2000/214; 2004/960; 2006/1221; 2008/1430; 2010/1230; 2012/1208; 2014/1067; 2017/1268; 2022/1068. Reflects subjective executive perception of utility. File Naming Convention: Files generally follow the pattern: PREFIX_Tool.csv, where the PREFIX indicates the data source: GT_: Google Trends GB_: Google Books Ngram CR_: Crossref.org (Count Data for this Raw Dataset) BU_: Bain & Company Survey (Usability) BS_: Bain & Company Survey (Satisfaction) The essential identification comes from the PREFIX and the Tool Name segment. This dataset resides within the 'Management Tool Source Data (Raw Extracts)' Dataverse.
Labor Force Survey 2001, Economic Research Forum (ERF) Harmonization Data -...
catalog.ihsn.org
Updated Jun 26, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Palestinian Central Bureau of Statistics (2017). Labor Force Survey 2001, Economic Research Forum (ERF) Harmonization Data - West Bank and Gaza [Dataset]. https://catalog.ihsn.org/index.php/catalog/6990
Explore at:
Dataset updated
Jun 26, 2017
Dataset provided by
Palestinian Central Bureau of Statisticshttp://pcbs.gov.ps/
Economic Research Forum
Time period covered
2001
Area covered
West Bank, Gaza Strip, Gaza
Description
Abstract

THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS

The Palestinian Central Bureau of Statistics (PCBS) carried out four rounds of the Labor Force Survey 2001 (LFS).

The importance of this survey lies in that it focuses mainly on labour force key indicators, main characteristics of the employed, unemployed, underemployed and persons outside labour force, labour force according to level of education, distribution of the employed population by occupation, economic activity, place of work, employment status, hours and days worked and average daily wage in NIS for the employees.

The survey main objectives are: - To estimate the labor force and its percentage to the population. - To estimate the number of employed individuals. - To analyze labour force according to gender, employment status, educational level , occupation and economic activity. - To provide information about the main changes in the labour market structure and its socio economic characteristics. - To estimate the numbers of unemployed individuals and analyze their general characteristics. - To estimate the rate of working hours and wages for employed individuals in addition to analyze of other characteristics.

The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.

Geographic coverage

The Data are representative at region level (West Bank, Gaza Strip), locality type (urban, rural, camp) and governorates

Analysis unit

1- Household/family. 2- Individual/person.

Universe

The survey covered all Palestinian households who are a usual residence of the Palestinian Territory.

Kind of data

Sample survey data [ssd]

Sampling procedure

THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS

The methodology was designed according to the context of the survey, international standards, data processing requirements and comparability of outputs with other related surveys.

Target Population:

All Palestinians aged 10 years or older living in the Palestinian Territory, excluding those living in institutions such as prisons or shelters.

Sampling Frame:

The sampling frame consisted of a master sample of Enumeration Areas (EAs) selected from the population housing and establishment census 1997. The master sample consists of area units of relatively equal size (number of households), these units have been used as Primary Sampling Units (PSUs).

Sample Design:

The sample is a two-stage stratified cluster random sample.

Stratification: Four levels of stratification were made:

Stratification by Governorates.

Stratification by type of locality which comprises: (a) Urban, (b) Rural, and (c) Refugee Camps

Stratification by classifying localities, excluding governorate centers, into three strata based on the ownership of households of durable goods within these localities.

Stratification by size of locality (number of households).

Sample Size:

The sample size in the first quarter consisted of 7,559 households, which amounts to a sample of around 28,959 persons aged 15 years and over (including 22,874 aged 15 years and over). In the second round the sample consisted of 7,559 households, which amounts to a sample of around 28,922 persons aged 10 years and over (including 22,762 aged 15 years and over), in the third round the sample consisted of 7,559 households, which amounts to a sample of around 28,380 persons aged 10 years and over (including 22,495 aged 15 years and over).which amount to a sample of around 26974 persons aged 10 years and over (including 21240 aged 15 years and over). In the fourth round the sample consisted of 7,559 households; which amounts to a sample of around 27,870 persons aged 10 years and over (including 21,868 aged 15 years and over).

The sample size allowed for non-response and related losses. In addition, the average number of households selected in each cell was 16.

Sample Rotation:

Each round of the Labor Force Survey covers all the 481 master sample areas. Basically, the areas remain fixed over time, but households in 50% of the EAs are replaced each round. The same household remains in the sample over 2 consecutive rounds, rests for the next two rounds and represented again in the sample for another and last two consecutive rounds before it is dropped from the sample. A 50 % overlap is then achieved between both consecutive rounds and between consecutive years (making the sample efficient for monitoring purposes). In earlier applications of the LFS (rounds 1 to 11); the rotation pattern used was different; requiring a household to remain in the sample for six consecutive rounds, then dropped. The objective of such a pattern was to increase the overlap between consecutive rounds. The new rotation pattern was introduced to reduce the burden on the households resulting from visiting the same household for six consecutive times.

Mode of data collection

Face-to-face [f2f]

Research instrument

One of the main survey tools is the questionnaire, the survey questionnaire was designed according to the International Labour Organization (ILO) recommendations. The questionnaire includes four main parts:

1. Identification Data:

The main objective for this part is to record the necessary information to identify the household, such as, cluster code, sector, type of locality, cell, housing number and the cell code.

2. Quality Control:

This part involves groups of controlling standards to monitor the field and office operation, to keep in order the sequence of questionnaire stages (data collection, field and office coding, data entry, editing after entry and store the data.

3. Household Roster:

This part involves demographic characteristics about the household, like number of persons in the household, date of birth, sex, educational level…etc.

4. Employment Part:

This part involves the major research indicators, where one questionnaire had been answered by every 15 years and over household member, to be able to explore their labour force status and recognize their major characteristics toward employment status, economic activity, occupation, place of work, and other employment indicators.

Cleaning operations

Raw Data

The data processing stage consisted of the following operations: 1. Editing before data entry All questionnaires were then edited in the main office using the same instructions adopted for editing in the field.

Coding At this stage, the Economic Activity variable underwent coding according to West Bank and Gaza Strip Standard commodities Classification, based on the United Nations ISIC-3. The Economic Activity for all employed and ever employed individuals was classified at the fourth-digit-level. The occupations were coded on the basis of the International Standard Occupational Classification of 1988 at the third-digit-level (ISCO-88).

Data Entry In this stage data were entered into the computer, using a data entry template BLAISE. The data entry program was prepared in order to satisfy the following requirements:

Duplication of the questionnaire on the computer screen.

Logical and consistency checks of data entered.

Possibility for internal editing of questionnaire answers.

Maintaining a minimum of errors in digital data entry and fieldwork.

User-friendly handling

Accordingly, data editing took place at a number of stages through the processing including: 1. office editing and coding 2. during data entry 3. structure checking and completeness 4. structural checking of SPSS data files

Harmonized Data

The SPSS package is used to clean and harmonize the datasets.

The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency.

All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization.

A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables.

A post-harmonization cleaning process is then conducted on the data.

Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.

Response rate

The overall response rate for the survey was 84.2%

More information on the distribution of response rates by different survey rounds is available in Page 11 of the data user guide provided among the disseminated survey materials under a file named "Palestine 2001- Data User Guide (English).pdf".

Sampling error estimates

Since the data reported here are based on a sample survey and not on a complete enumeration, they are subjected to sampling errors as well as non-sampling errors. Sampling errors are random outcomes of the sample design, and are, therefore, in principle measurable by the statistical concept of standard error.

A description of the
R
Real-World Evidence Solutions Market Report
marketreportanalytics.com
doc, pdf, ppt
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Real-World Evidence Solutions Market Report [Dataset]. https://www.marketreportanalytics.com/reports/real-world-evidence-solutions-market-2402
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Jul 8, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Real-World Evidence (RWE) Solutions market is experiencing robust growth, projected to reach $828.46 million in 2025 and expand at a compound annual growth rate (CAGR) of 13% from 2025 to 2033. This significant expansion is driven by several key factors. The increasing adoption of RWE in regulatory decision-making, fueled by the need for more efficient and cost-effective drug development, is a primary driver. Furthermore, the rising availability of large, diverse datasets from electronic health records (EHRs), claims databases, and wearable devices provides rich sources of real-world data for analysis. Pharmaceutical companies and healthcare providers are actively investing in RWE solutions to improve clinical trial design, enhance post-market surveillance, and optimize treatment strategies, further bolstering market growth. The market is segmented by type (e.g., software, services) and application (e.g., drug development, post-market surveillance), each exhibiting unique growth trajectories influenced by specific technological advancements and regulatory landscapes. Competitive strategies among leading companies, such as Clinigen Group Plc, ICON Plc, and IQVIA Inc., focus on strategic partnerships, technological innovation, and expansion into new geographical markets. These companies are engaged in developing advanced analytical tools and data integration platforms to cater to growing demands for comprehensive RWE solutions. The North American market currently holds a substantial share, driven by robust regulatory frameworks and advanced healthcare infrastructure. However, other regions, particularly Asia Pacific, are expected to witness significant growth in the coming years due to increasing healthcare expenditure and technological advancements. The restraints on market growth are primarily related to data privacy concerns, regulatory hurdles in accessing and utilizing real-world data, and the need for robust data standardization across different sources. However, proactive measures like developing better data security protocols, clarifying regulatory guidelines, and investing in data harmonization initiatives are mitigating these challenges. The future of the RWE Solutions market hinges on continuous technological innovation, particularly in areas like artificial intelligence (AI) and machine learning (ML), which can enhance data analysis and generate valuable insights from complex datasets. Further growth will depend on fostering collaboration among stakeholders, including regulatory bodies, healthcare providers, and technology companies, to create a more conducive environment for RWE adoption.
Centro Nazionale Agritech Deliverable 4.3.1 data grids for agriculture
zenodo.org
Updated Jul 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MARCO PIRAGNOLO; MARCO PIRAGNOLO; Francesco Pirotti; Francesco Pirotti; SAMUELE TRESTINI; SAMUELE TRESTINI (2025). Centro Nazionale Agritech Deliverable 4.3.1 data grids for agriculture [Dataset]. http://doi.org/10.5281/zenodo.16032115
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.16032115
Dataset updated
Jul 18, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
MARCO PIRAGNOLO; MARCO PIRAGNOLO; Francesco Pirotti; Francesco Pirotti; SAMUELE TRESTINI; SAMUELE TRESTINI
Description
The review of existing literature and the analysis of data sources have provided valuable insights into the relationships among various data sources, while summarizing aspects related to accessibility, data formats, and metadata. Given the considerable heterogeneity of these sources, it becomes evident the necessity of data harmonization, encompassing aspects such as accessibility, data formats, and metadata standards.

Upon deconstructing the table and delving into a detailed analysis, we identified a total of 79 distinct information layers as reported in the spreadsheet in this dataset. It is important to highlight the heterogeneity of the data formats. The data formats are spreadsheets, NetCDS, raster, vector, and online services. The table format, which includes structured and not structured data, is widely used and data harmonization is needed. The NetCDF format is widely used in climatology, meteorology, with 21 data sources employing this format. NetCDF is a platform-independent format characterized by arrays of data inclusive of metadata. Accessing and manipulation of this data format is supported in many languages through libraries that allow the data integration into GIS applications and are visualized as regular grids. The fundamental difference between NetCDF and raster is the array structure of the information, whereas raster is a rectangular matrix. Raster data are present in 19 of the total sources; this category includes remote sensing data, derived indices, or maps available as Web Map Services (WMS). In contrast, vectorial data are only present in two sources. Finally, text format is represented by six sources. About data accessibility, the majority need manual downloads, while only few data sources are accessible through online services such as WMS, a Web Coverage Services (WCS), dedicated tools, Application Programming Interface (API), and, in some instances they require a registration on the respective websites.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sieger, Rainer; Grobe, Hannes; Alfred Wegener Institute, Helmholtz Center for Polar and Marine Research, Bremerhaven (2018). PanTool – software for data harmonization and conversion, Version 1 [Dataset]. http://doi.org/10.1594/PANGAEA.510701

PanTool – software for data harmonization and conversion, Version 1

Explore at:

Unique identifier

https://doi.org/10.1594/PANGAEA.510701

Dataset updated

Apr 15, 2018

Dataset provided by

PANGAEA Data Publisher for Earth and Environmental Science

Authors

Sieger, Rainer; Grobe, Hannes; Alfred Wegener Institute, Helmholtz Center for Polar and Marine Research, Bremerhaven

Description

The program PanTool was developed as a tool box like a Swiss Army Knife for data conversion and recalculation, written to harmonize individual data collections to standard import format used by PANGAEA. The format of input files the program PanTool needs is a tabular saved in plain ASCII. The user can create this files with a spread sheet program like MS-Excel or with the system text editor. PanTool is distributed as freeware for the operating systems Microsoft Windows, Apple OS X and Linux.

Clear search

Close search

Google apps

Main menu

PanTool – software for data harmonization and conversion, Version 1

Harp: Data Harmonization for Computational Tissue Deconvolution across...

Description and harmonization strategy for the predictor variables.

Additional file 3: of scAlign: a tool for alignment, integration, and rare...

Evaluation of item matching strategies to harmonize assessment tools for...

Table_1_Streamlining intersectoral provision of real-world health data: a...

Data from: Integration and harmonization of trait data from plant...

Data from: Harmonizing oer metadata in etl processes with skohub in the...

Data from: HarDWR - Harmonized Water Rights Records

GWAS summary statistics imputation support data and integration with...

Knowledge Management - Raw Source Data

Fundamental Data Record for Atmospheric Composition [ATMOS_L1B]

Trade Promotion Management and Optimization for the Consumer Goods Report

Data from: Assay Harmonization Study To Measure Immune Response to...

National Institute of Mental Health Data Archive

Customer Segmentation - Raw Source Data

Scenario Planning - Raw Source Data

Labor Force Survey 2001, Economic Research Forum (ERF) Harmonization Data -...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Target Population:

Sampling Frame:

Sample Design:

Sample Size:

Sample Rotation:

Mode of data collection

Research instrument

1. Identification Data:

2. Quality Control:

3. Household Roster:

4. Employment Part:

Cleaning operations

Raw Data

Harmonized Data

Response rate

Sampling error estimates

Real-World Evidence Solutions Market Report

Centro Nazionale Agritech Deliverable 4.3.1 data grids for agriculture

PanTool – software for data harmonization and conversion, Version 1