21 datasets found
  1. f

    Table_1_A scalable and transparent data pipeline for AI-enabled health data...

    • figshare.com
    docx
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tuncay Namli; Ali Anıl Sınacı; Suat Gönül; Cristina Ruiz Herguido; Patricia Garcia-Canadilla; Adriana Modrego Muñoz; Arnau Valls Esteve; Gökçe Banu Laleci Ertürkmen (2024). Table_1_A scalable and transparent data pipeline for AI-enabled health data ecosystems.docx [Dataset]. http://doi.org/10.3389/fmed.2024.1393123.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jul 30, 2024
    Dataset provided by
    Frontiers
    Authors
    Tuncay Namli; Ali Anıl Sınacı; Suat Gönül; Cristina Ruiz Herguido; Patricia Garcia-Canadilla; Adriana Modrego Muñoz; Arnau Valls Esteve; Gökçe Banu Laleci Ertürkmen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionTransparency and traceability are essential for establishing trustworthy artificial intelligence (AI). The lack of transparency in the data preparation process is a significant obstacle in developing reliable AI systems which can lead to issues related to reproducibility, debugging AI models, bias and fairness, and compliance and regulation. We introduce a formal data preparation pipeline specification to improve upon the manual and error-prone data extraction processes used in AI and data analytics applications, with a focus on traceability.MethodsWe propose a declarative language to define the extraction of AI-ready datasets from health data adhering to a common data model, particularly those conforming to HL7 Fast Healthcare Interoperability Resources (FHIR). We utilize the FHIR profiling to develop a common data model tailored to an AI use case to enable the explicit declaration of the needed information such as phenotype and AI feature definitions. In our pipeline model, we convert complex, high-dimensional electronic health records data represented with irregular time series sampling to a flat structure by defining a target population, feature groups and final datasets. Our design considers the requirements of various AI use cases from different projects which lead to implementation of many feature types exhibiting intricate temporal relations.ResultsWe implement a scalable and high-performant feature repository to execute the data preparation pipeline definitions. This software not only ensures reliable, fault-tolerant distributed processing to produce AI-ready datasets and their metadata including many statistics alongside, but also serve as a pluggable component of a decision support application based on a trained AI model during online prediction to automatically prepare feature values of individual entities. We deployed and tested the proposed methodology and the implementation in three different research projects. We present the developed FHIR profiles as a common data model, feature group definitions and feature definitions within a data preparation pipeline while training an AI model for “predicting complications after cardiac surgeries”.DiscussionThrough the implementation across various pilot use cases, it has been demonstrated that our framework possesses the necessary breadth and flexibility to define a diverse array of features, each tailored to specific temporal and contextual criteria.

  2. G

    Data Reference Standard on Countries, Territories and Geographic areas

    • open.canada.ca
    csv
    Updated Oct 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Global Affairs Canada (2025). Data Reference Standard on Countries, Territories and Geographic areas [Dataset]. https://open.canada.ca/data/dataset/cac6fd9f-594a-4bcd-bf17-10295812d4c5
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 28, 2025
    Dataset provided by
    Global Affairs Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    This reference data provides a standard list of values for all Countries, Territories and Geographic areas. This list is intended to standardize the way Countries, Territories and Geographic areas are described in datasets to enable data interoperability and improve data quality. The data dictionary explains what each column means in the list.

  3. FHIR-Profiles-Resources

    • kaggle.com
    zip
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fhirfly (2023). FHIR-Profiles-Resources [Dataset]. https://www.kaggle.com/datasets/fhirfly/fhirr4
    Explore at:
    zip(3709939 bytes)Available download formats
    Dataset updated
    Aug 1, 2023
    Authors
    fhirfly
    Description

    Kaggle Card: FHIR Profiles-Resources JSON File Overview Fast Healthcare Interoperability Resources (FHIR, pronounced "fire") is a standard developed by Health Level Seven International (HL7) for transferring electronic health records. The FHIR Profiles-Resources JSON file is an essential part of this standard. It provides a schema that defines the structure of FHIR resource types, including their properties and attributes.

    Dataset Structure This file is structured in the JSON format, known for its versatility and human-readable nature. Each JSON object corresponds to a unique FHIR resource type, outlining its structure and providing a blueprint for the properties and attributes each resource type should contain.

    Fields Description While the precise properties and attributes differ for each FHIR resource type, the typical elements you may encounter in this file include:

    Id: The unique identifier for the resource type. Url: A global identifier URI for the resource type. Version: The business version of the resource. Name: The human-readable name for the resource type. Status: The publication status of the resource (draft, active, retired). Experimental: A boolean value indicating whether this resource type is experimental. Date: The date of the resource type's last change. Publisher: The individual or organization that published the resource type. Contact: Contact details for the publishers. Description: A natural language description of the resource type. UseContext: A list outlining the usability context for the resource type. Jurisdiction: Identifies the region/country where the resource type is defined. Purpose: An explanation of why the resource type is necessary. Element: A list defining the structure of the properties for the resource type, including data types and relationships with other resource types. Potential Use Cases Schema Validation: Use the schema to validate FHIR data and ensure it aligns with the defined structure and types for each resource. Interoperability: Facilitate the exchange of healthcare information with other FHIR-compatible systems by providing a standardized structure. Data Mapping: Utilize the schema to map data from other formats into the FHIR format, or vice versa. System Design: Aid the design and development of healthcare systems by offering a template for data structure.

  4. e

    Definitions of Interoperability

    • data.europa.eu
    unknown
    Updated Jan 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The int:net Consortium (2022). Definitions of Interoperability [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-15724891?locale=nl
    Explore at:
    unknown(39296)Available download formats
    Dataset updated
    Jan 23, 2022
    Dataset authored and provided by
    The int:net Consortium
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides a list of different definitions of interoperability from the literature.

  5. Z

    SeaLiT Knowledge Graphs - Maritime History Data in RDF using a CIDOC-CRM...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    Updated Jul 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kritsotaki, Athina; Marketakis, Yannis; Fafalios, Pavlos (2022). SeaLiT Knowledge Graphs - Maritime History Data in RDF using a CIDOC-CRM extension (SeaLiT Ontology) [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6460840
    Explore at:
    Dataset updated
    Jul 4, 2022
    Dataset provided by
    Institute of Computer Science - FORTH
    Authors
    Kritsotaki, Athina; Marketakis, Yannis; Fafalios, Pavlos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SeaLiT Knowledge Graphs is an RDF dataset of maritime history data that has been transcribed (and then transformed) from original archival sources in the context of the SeaLiT Project (Seafaring Lives in Transition, Mediterranean Maritime Labour and Shipping, 1850s-1920s). The underlying data model is the SeaLiT Ontology, an extension of the ISO standard CIDOC-CRM (ISO 21127:2014) for the modelling and integration of maritime history information.

    The knowledge graphs integrate data of totally 16 different types of archival sources:

    Crew Lists

    Crew and displacement list (Roll)

    Crew List (Ruoli di Equipaggio)

    General Spanish Crew List

    Registers / Lists

    Students Register

    Civil Register

    Register of Maritime Personnel

    Register of Maritime Workers (Matricole della gente di mare)

    Sailors Register (Libro de registro de marineros)

    Naval Ship Register List

    Seagoing Personnel

    Lists of ships

    Censuses

    Census La Ciotat

    First National all-Russian Census of the Russian Empire

    Payrolls

    Payrolls of private archives and libraries in Greece

    Payrolls of Russian Steam Navigation and Trading Company

    Employment records

    Shipyards of Messageries Maritimes, La Ciotat

    More information about the archival sources are available through the SeaLiT website. Data exploration applications over these sources are also publicly available (SeaLiT Catalogues, SeaLiT ResearchSpace).

    Data from these archival sources has been transcribed in tabular form and then curated by historians of SeaLiT using the FAST CAT system. The transcripts (records), together with the curated vocabulary terms and entity instances (ships, persons, locations, organizations), are then transformed to RDF using the SeaLiT Ontology as the target (domain) model. To this end, the corresponding schema mappings between the original schemata and the ontology were defined using the X3ML mapping definition language, that were subsequently used for delivering the RDF datasets.

    More information about the FAST CAT system and the data transcription, curation and transformation processes can be found in the following paper:

    P. Fafalios, K. Petrakis, G. Samaritakis, K. Doerr, A. Kritsotaki, Y. Tzitzikas, M. Doerr, "FAST CAT: Collaborative Data Entry and Curation for Semantic Interoperability in Digital Humanities", ACM Journal on Computing and Cultural Heritage, 2021. https://doi.org/10.1145/3461460 [pdf, bib]

    The RDF dataset is provided as a set of TriG files per record per archival source. For each record, the dataset provides: i) one trig file for the record's data (records.trig), ii) one trig file for the record's (curated) vocabulary terms (vocabularies.trig), and iii) four trig files for the record's (curated) entity instances (ships.trig, persons.trig, persons.trig, organizations.trig).

    We also provide the RDFS files of the used ontologies (SeaLiT Ontology verson 1.0, CIDOC-CRM version 7.1.1).

  6. u

    Data Reference Standard on Countries, Territories and Geographic areas -...

    • data.urbandatacentre.ca
    Updated Oct 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Data Reference Standard on Countries, Territories and Geographic areas - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-cac6fd9f-594a-4bcd-bf17-10295812d4c5
    Explore at:
    Dataset updated
    Oct 19, 2025
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    This reference data provides a standard list of values for all Countries, Territories and Geographic areas. This list is intended to standardize the way Countries, Territories and Geographic areas are described in datasets to enable data interoperability and improve data quality. The data dictionary explains what each column means in the list.

  7. Element Definition

    • johnsnowlabs.com
    csv
    Updated Sep 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2018). Element Definition [Dataset]. https://www.johnsnowlabs.com/marketplace/element-definition/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 20, 2018
    Dataset authored and provided by
    John Snow Labs
    Area covered
    United States
    Description

    This dataset has the Fast Healthcare Interoperability Resources (FHIR) definition of an element in a resource or an extension. The definition includes:

    • Path (name), Cardinality, and data type
    • Definitions, usage notes, and requirements
    • Default or fixed values
    • Constraints, Length limits, and other usage rules
    • Terminology Binding
    • Mappings to other specifications
    • Structural Usage Information
  8. Large Scale International Boundaries

    • catalog.data.gov
    • geodata.state.gov
    • +1more
    Updated Aug 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of State (Point of Contact) (2025). Large Scale International Boundaries [Dataset]. https://catalog.data.gov/dataset/large-scale-international-boundaries
    Explore at:
    Dataset updated
    Aug 30, 2025
    Dataset provided by
    United States Department of Statehttp://state.gov/
    Description

    Overview The Office of the Geographer and Global Issues at the U.S. Department of State produces the Large Scale International Boundaries (LSIB) dataset. The current edition is version 11.4 (published 24 February 2025). The 11.4 release contains updated boundary lines and data refinements designed to extend the functionality of the dataset. These data and generalized derivatives are the only international boundary lines approved for U.S. Government use. The contents of this dataset reflect U.S. Government policy on international boundary alignment, political recognition, and dispute status. They do not necessarily reflect de facto limits of control. National Geospatial Data Asset This dataset is a National Geospatial Data Asset (NGDAID 194) managed by the Department of State. It is a part of the International Boundaries Theme created by the Federal Geographic Data Committee. Dataset Source Details Sources for these data include treaties, relevant maps, and data from boundary commissions, as well as national mapping agencies. Where available and applicable, the dataset incorporates information from courts, tribunals, and international arbitrations. The research and recovery process includes analysis of satellite imagery and elevation data. Due to the limitations of source materials and processing techniques, most lines are within 100 meters of their true position on the ground. Cartographic Visualization The LSIB is a geospatial dataset that, when used for cartographic purposes, requires additional styling. The LSIB download package contains example style files for commonly used software applications. The attribute table also contains embedded information to guide the cartographic representation. Additional discussion of these considerations can be found in the Use of Core Attributes in Cartographic Visualization section below. Additional cartographic information pertaining to the depiction and description of international boundaries or areas of special sovereignty can be found in Guidance Bulletins published by the Office of the Geographer and Global Issues: https://data.geodata.state.gov/guidance/index.html Contact Direct inquiries to internationalboundaries@state.gov. Direct download: https://data.geodata.state.gov/LSIB.zip Attribute Structure The dataset uses the following attributes divided into two categories: ATTRIBUTE NAME | ATTRIBUTE STATUS CC1 | Core CC1_GENC3 | Extension CC1_WPID | Extension COUNTRY1 | Core CC2 | Core CC2_GENC3 | Extension CC2_WPID | Extension COUNTRY2 | Core RANK | Core LABEL | Core STATUS | Core NOTES | Core LSIB_ID | Extension ANTECIDS | Extension PREVIDS | Extension PARENTID | Extension PARENTSEG | Extension These attributes have external data sources that update separately from the LSIB: ATTRIBUTE NAME | ATTRIBUTE STATUS CC1 | GENC CC1_GENC3 | GENC CC1_WPID | World Polygons COUNTRY1 | DoS Lists CC2 | GENC CC2_GENC3 | GENC CC2_WPID | World Polygons COUNTRY2 | DoS Lists LSIB_ID | BASE ANTECIDS | BASE PREVIDS | BASE PARENTID | BASE PARENTSEG | BASE The core attributes listed above describe the boundary lines contained within the LSIB dataset. Removal of core attributes from the dataset will change the meaning of the lines. An attribute status of “Extension” represents a field containing data interoperability information. Other attributes not listed above include “FID”, “Shape_length” and “Shape.” These are components of the shapefile format and do not form an intrinsic part of the LSIB. Core Attributes The eight core attributes listed above contain unique information which, when combined with the line geometry, comprise the LSIB dataset. These Core Attributes are further divided into Country Code and Name Fields and Descriptive Fields. County Code and Country Name Fields “CC1” and “CC2” fields are machine readable fields that contain political entity codes. These are two-character codes derived from the Geopolitical Entities, Names, and Codes Standard (GENC), Edition 3 Update 18. “CC1_GENC3” and “CC2_GENC3” fields contain the corresponding three-character GENC codes and are extension attributes discussed below. The codes “Q2” or “QX2” denote a line in the LSIB representing a boundary associated with areas not contained within the GENC standard. The “COUNTRY1” and “COUNTRY2” fields contain the names of corresponding political entities. These fields contain names approved by the U.S. Board on Geographic Names (BGN) as incorporated in the ‘"Independent States in the World" and "Dependencies and Areas of Special Sovereignty" lists maintained by the Department of State. To ensure maximum compatibility, names are presented without diacritics and certain names are rendered using common cartographic abbreviations. Names for lines associated with the code "Q2" are descriptive and not necessarily BGN-approved. Names rendered in all CAPITAL LETTERS denote independent states. Names rendered in normal text represent dependencies, areas of special sovereignty, or are otherwise presented for the convenience of the user. Descriptive Fields The following text fields are a part of the core attributes of the LSIB dataset and do not update from external sources. They provide additional information about each of the lines and are as follows: ATTRIBUTE NAME | CONTAINS NULLS RANK | No STATUS | No LABEL | Yes NOTES | Yes Neither the "RANK" nor "STATUS" fields contain null values; the "LABEL" and "NOTES" fields do. The "RANK" field is a numeric expression of the "STATUS" field. Combined with the line geometry, these fields encode the views of the United States Government on the political status of the boundary line. ATTRIBUTE NAME | | VALUE | RANK | 1 | 2 | 3 STATUS | International Boundary | Other Line of International Separation | Special Line A value of “1” in the “RANK” field corresponds to an "International Boundary" value in the “STATUS” field. Values of ”2” and “3” correspond to “Other Line of International Separation” and “Special Line,” respectively. The “LABEL” field contains required text to describe the line segment on all finished cartographic products, including but not limited to print and interactive maps. The “NOTES” field contains an explanation of special circumstances modifying the lines. This information can pertain to the origins of the boundary lines, limitations regarding the purpose of the lines, or the original source of the line. Use of Core Attributes in Cartographic Visualization Several of the Core Attributes provide information required for the proper cartographic representation of the LSIB dataset. The cartographic usage of the LSIB requires a visual differentiation between the three categories of boundary lines. Specifically, this differentiation must be between: International Boundaries (Rank 1); Other Lines of International Separation (Rank 2); and Special Lines (Rank 3). Rank 1 lines must be the most visually prominent. Rank 2 lines must be less visually prominent than Rank 1 lines. Rank 3 lines must be shown in a manner visually subordinate to Ranks 1 and 2. Where scale permits, Rank 2 and 3 lines must be labeled in accordance with the “Label” field. Data marked with a Rank 2 or 3 designation does not necessarily correspond to a disputed boundary. Please consult the style files in the download package for examples of this depiction. The requirement to incorporate the contents of the "LABEL" field on cartographic products is scale dependent. If a label is legible at the scale of a given static product, a proper use of this dataset would encourage the application of that label. Using the contents of the "COUNTRY1" and "COUNTRY2" fields in the generation of a line segment label is not required. The "STATUS" field contains the preferred description for the three LSIB line types when they are incorporated into a map legend but is otherwise not to be used for labeling. Use of the “CC1,” “CC1_GENC3,” “CC2,” “CC2_GENC3,” “RANK,” or “NOTES” fields for cartographic labeling purposes is prohibited. Extension Attributes Certain elements of the attributes within the LSIB dataset extend data functionality to make the data more interoperable or to provide clearer linkages to other datasets. The fields “CC1_GENC3” and “CC2_GENC” contain the corresponding three-character GENC code to the “CC1” and “CC2” attributes. The code “QX2” is the three-character counterpart of the code “Q2,” which denotes a line in the LSIB representing a boundary associated with a geographic area not contained within the GENC standard. To allow for linkage between individual lines in the LSIB and World Polygons dataset, the “CC1_WPID” and “CC2_WPID” fields contain a Universally Unique Identifier (UUID), version 4, which provides a stable description of each geographic entity in a boundary pair relationship. Each UUID corresponds to a geographic entity listed in the World Polygons dataset. These fields allow for linkage between individual lines in the LSIB and the overall World Polygons dataset. Five additional fields in the LSIB expand on the UUID concept and either describe features that have changed across space and time or indicate relationships between previous versions of the feature. The “LSIB_ID” attribute is a UUID value that defines a specific instance of a feature. Any change to the feature in a lineset requires a new “LSIB_ID.” The “ANTECIDS,” or antecedent ID, is a UUID that references line geometries from which a given line is descended in time. It is used when there is a feature that is entirely new, not when there is a new version of a previous feature. This is generally used to reference countries that have dissolved. The “PREVIDS,” or Previous ID, is a UUID field that contains old versions of a line. This is an additive field, that houses all Previous IDs. A new version of a feature is defined by any change to the

  9. a

    New Zealand Regional Councils

    • resources-gisinschools-nz.hub.arcgis.com
    • gisinschools.eagle.co.nz
    Updated Nov 10, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GIS in Schools - Teaching Materials - New Zealand (2016). New Zealand Regional Councils [Dataset]. https://resources-gisinschools-nz.hub.arcgis.com/datasets/new-zealand-regional-councils
    Explore at:
    Dataset updated
    Nov 10, 2016
    Dataset authored and provided by
    GIS in Schools - Teaching Materials - New Zealand
    Area covered
    New Zealand,
    Description

    The region is the top tier of local government in New Zealand. There are 16 regions of New Zealand (Part 1 of Schedule 2 of the Local Government Act 2002). Eleven are governed by an elected regional council, while five are governed by territorial authorities (the second tier of local government) who also perform the functions of a regional council and thus are known as unitary authorities. These unitary authorities are Auckland Council, Nelson City Council, Gisborne, Tasman, and Marlborough District Councils. The Chatham Islands Council also perform some of the functions of a regional council, but is not strictly a unitary authority. Unitary authorities act as regional councils for the purposes of a wide range of Acts and regulations. Regional council areas are based on water catchment areas. Regional councils are responsible for the administration of many environmental and public transport matters.Regional Councils were established in 1989 after the abolition of the 22 local government regions. The local government act 2002, requires the boundaries of regions to confirm as far as possible to one or more water catchments. When determining regional boundaries, the local Government commission gave consideration to regional communities of interest when selecting water catchments to included in a region. It also considered factors such as natural resource management, land use planning and environmental matters. Some regional boundaries are conterminous with territorial authority boundaries but there are many exceptions. An example is Taupo District, which is split between four regions, although most of its area falls within the Waikato Region. Where territorial local authorities straddle regional council boundaries, the affected area have been statistically defined in complete area units. Generally regional councils contain complete territorial authorities. The unitary authority of the Auckland Council was formed in 2010, under the Local Government (Tamaki Makarau Reorganisation) Act 2009, replacing the Auckland Regional Council and seven territorial authorities.The seaward boundary of any costal regional council is the twelve mile New Zealand territorial limit. Regional councils are defined at meshblock and area unit level.Regional Councils included in the 2013 digital pattern are:Regional Council CodeRegional Council Name01Northland Region02Auckland Region03Waikato Region04Bay of Plenty Region05Gisborne Region06Hawke's Bay Region07Taranaki Region08Manawatu-Wanganui Region09Wellington Region12West Coast Region13Canterbury Region14Otago Region15Southland Region16Tasman Region17Nelson Region18Marlborough Region99Area Outside RegionAs at 1stJuly 2007, Digital Boundary data became freely available.Deriving of Output FilesThe original vertices delineating the meshblock boundary pattern were digitised in 1991 from 1:5,000 scale urban maps and 1:50,000 scale rural maps. The magnitude of error of the original digital points would have been in the range of +/- 10 metres in urban areas and +/- 25 metres in rural areas. Where meshblock boundaries coincide with cadastral boundaries the magnitude of error will be within the range of 1–5 metres in urban areas and 5 - 20 metres in rural areas. This being the estimated magnitude of error of Landonline.The creation of high definition and generalised meshblock boundaries for the 2013 digital pattern and the dissolving of these meshblocks into other geographies/boundaries were completed within Statistics New Zealand using ESRI's ArcGIS desktop suite and the Data Interoperability extension with the following process: 1. Import data and all attribute fields into an ESRI File Geodatabase from LINZ as a shapefile2. Run geometry checks and repairs.3. Run Topology Checks on all data (Must Not Have Gaps, Must Not Overlap), detailed below.4. Generalise the meshblock layers to a 1m tolerance to create generalised dataset. 5. Clip the high definition and generalised meshblock layers to the coastline using land water codes.6. Dissolve all four meshblock datasets (clipped and unclipped, for both generalised and high definition versions) to higher geographies to create the following output data layers: Area Unit, Territorial Authorities, Regional Council, Urban Areas, Community Boards, Territorial Authority Subdivisions, Wards Constituencies and Maori Constituencies for the four datasets. 7. Complete a frequency analysis to determine that each code only has a single record.8. Re-run topology checks for overlaps and gaps.9. Export all created datasets into MapInfo and Shapefile format using the Data Interoperability extension to create 3 output formats for each file. 10. Quality Assurance and rechecking of delivery files.The High Definition version is similar to how the layer exists in Landonline with a couple of changes to fix topology errors identified in topology checking. The following quality checks and steps were applied to the meshblock pattern:Translation of ESRI Shapefiles to ESRI geodatabase datasetThe meshblock dataset was imported into the ESRI File Geodatabase format, required to run the ESRI topology checks. Topology rules were set for each of the layers. Topology ChecksA tolerance of 0.1 cm was applied to the data, which meant that the topology engine validating the data saw any vertex closer than this distance as the same location. A default topology rule of “Must Be Larger than Cluster Tolerance” is applied to all data – this would highlight where any features with a width less than 0.1cm exist. No errors were found for this rule.Three additional topology rules were applied specifically within each of the layers in the ESRI geodatabase – namely “Must Not Overlap”, “Must Not Have Gaps” and “"Area Boundary Must Be Covered By Boundary Of (Meshblock)”. These check that a layer forms a continuous coverage over a surface, that any given point on that surface is only assigned to a single category, and that the dissolved boundaries are identical to the parent meshblock boundaries.Topology Checks Results: There were no errors in either the gap or overlap checks.GeneralisingTo create the generalised Meshblock layer the “Simplify Polygon” geoprocessing tool was used in ArcGIS, with the following parameters:Simplification Algorithm: POINT_REMOVEMaximum Allowable Offset: 1 metreMinimum Area: 1 square metreHandling Topological Errors: RESOLVE_ERRORSClipping of Layers to CoastlineThe processed feature class was then clipped to the coastline. The coastline was defined as features within the supplied Land2013 with codes and descriptions as follows:11- Island – Included12- Mainland – Included21- Inland Water – Included22- Inlet – Excluded23- Oceanic –Excluded33- Other – Included.Features were clipped using the Data Interoperability extension, attribute filter tool. The attribute filter was used on both the generalised and high definition meshblock datasets creating four meshblock layers. Each meshblock dataset also contained all higher geographies and land-water data as attributes. Note: Meshblock 0017001 which is classified as island, was excluded from the clipped meshblock layers, as most of this meshblock is oceanic. Dissolve meshblocks to higher geographiesStatistics New Zealand then dissolved the ESRI meshblock feature classes to the higher geographies, for both the full and clipped dataset, generalised and high definition datasets. To dissolve the higher geographies, a model was built using the dissolver, aggregator and sorter tools, with each output set to include geography code and names within the Data Interoperability extension. Export to MapInfo Format and ShapfilesThe data was exported to MapInfo and Shapefile format using ESRI's Data Interoperability extension Translation tool. Quality Assurance and rechecking of delivery filesThe feature counts of all files were checked to ensure all layers had the correct number of features. This included checking that all multipart features had translated correctly in the new file.

  10. n

    Data from: Generalizable EHR-R-REDCap pipeline for a national...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Jan 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller (2022). Generalizable EHR-R-REDCap pipeline for a national multi-institutional rare tumor patient registry [Dataset]. http://doi.org/10.5061/dryad.rjdfn2zcm
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 9, 2022
    Dataset provided by
    Massachusetts General Hospital
    Harvard Medical School
    Authors
    Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.

    Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.

    Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.

    Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.

    Methods eLAB Development and Source Code (R statistical software):

    eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).

    eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.

    Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.

    The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).

    Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.

    Data Dictionary (DD)

    EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.

    Study Cohort

    This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.

    Statistical Analysis

    OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.

  11. The NIST Extensible Resource Data Model (NERDm): JSON schemas for rich...

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2025). The NIST Extensible Resource Data Model (NERDm): JSON schemas for rich description of data resources [Dataset]. https://catalog.data.gov/dataset/the-nist-extensible-resource-data-model-nerdm-json-schemas-for-rich-description-of-data-re
    Explore at:
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    The NIST Extensible Resource Data Model (NERDm) is a set of schemas for encoding in JSON format metadatathat describe digital resources. The variety of digital resources it can describe includes not onlydigital data sets and collections, but also software, digital services, web sites and portals, anddigital twins. It was created to serve as the internal metadata format used by the NIST Public DataRepository and Science Portal to drive rich presentations on the web and to enable discovery; however, itwas also designed to enable programmatic access to resources and their metadata by external users.Interoperability was also a key design aim: the schemas are defined using the JSON Schema standard,metadata are encoded as JSON-LD, and their semantics are tied to community ontologies, with an emphasison DCAT and the US federal Project Open Data (POD) models. Finally, extensibility is also central to itsdesign: the schemas are composed of a central core schema and various extension schemas. New extensionsto support richer metadata concepts can be added over time without breaking existing applications.Validation is central to NERDm's extensibility model. Consuming applications should be able to choosewhich metadata extensions they care to support and ignore terms and extensions they don't support.Furthermore, they should not fail when a NERDm document leverages extensions they don't recognize, evenwhen on-the-fly validation is required. To support this flexibility, the NERDm framework allowsdocuments to declare what extensions are being used and where. We have developed an optional extensionto the standard JSON Schema validation (see ejsonschema below) to support flexible validation: while astandard JSON Schema validater can validate a NERDm document against the NERDm core schema, our extensionwill validate a NERDm document against any recognized extensions and ignore those that are notrecognized.The NERDm data model is based around the concept of resource, semantically equivalent to a schema.orgResource, and as in schema.org, there can be different types of resources, such as data sets andsoftware. A NERDm document indicates what types the resource qualifies as via the JSON-LD "@type"property. All NERDm Resources are described by metadata terms from the core NERDm schema; however,different resource types can be described by additional metadata properties (often drawing on particularNERDm extension schemas). A Resource contains Components of various types (includingDCAT-defined Distributions) that are considered part of the Resource; specifically, these can include downloadable data files, hierachical datacollecitons, links to web sites (like software repositories), software tools, or other NERDm Resources.Through the NERDm extension system, domain-specific metadata can be included at either the resource orcomponent level. The direct semantic and syntactic connections to the DCAT, POD, and schema.org schemasis intended to ensure unambiguous conversion of NERDm documents into those schemas.As of this writing, the Core NERDm schema and its framework stands at version 0.7 and is compatible withthe "draft-04" version of JSON Schema. Version 1.0 is projected to be released in 2025. In thatrelease, the NERDm schemas will be updated to the "draft2020" version of JSON Schema. Other improvementswill include stronger support for RDF and the Linked Data Platform through its support of JSON-LD.

  12. Design of an Ontology-Driven Constraint Tester (ODCT) and Application to...

    • zenodo.org
    bin, json, mp4, pdf
    Updated Oct 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tareq Md Rabiul Hossain Chy; Tareq Md Rabiul Hossain Chy (2024). Design of an Ontology-Driven Constraint Tester (ODCT) and Application to SAREF & Smart Energy Appliances: Datasets, SHACL Shapes, Demo Video of Web Application, and Detailed Performance Reports [Dataset]. http://doi.org/10.5281/zenodo.13955566
    Explore at:
    pdf, bin, json, mp4Available download formats
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tareq Md Rabiul Hossain Chy; Tareq Md Rabiul Hossain Chy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 19, 2024
    Description

    Description

    This repository presents the resources used for validating the compliance of smart energy appliances against the Smart Appliances REFerence (SAREF) ontology and its extension SAREF4ENER, as part of the Ontology-Driven Constraint Tester (ODCT) project. The ODCT tool is specifically designed to ensure semantic interoperability and adherence to standardized ontological frameworks, which are crucial for integrating smart devices into modern energy management systems.

    ODCT Overview

    The Ontology-Driven Constraint Tester (ODCT) is a robust framework created to validate datasets against ontologies defined by SAREF and SAREF4ENER, both established under ETSI SmartM2M. This tool has been applied to the Flexible Start use case from the Joint Research Centre’s (JRC) Code of Conduct for Energy Smart Appliances. The ODCT tool ensures that smart devices like energy-efficient washing machines, thermostats, and connected lighting operate in compliance with established ontologies, thereby enhancing their interoperability within energy management systems and smart grids.

    Repository Contents

    This repository contains essential resources used in the ODCT compliance testing process:

    • Compliant Dataset: This dataset represents a fully compliant scenario where no errors are present in the smart energy appliances’ profiles, demonstrating the ODCT’s accuracy under ideal conditions.

    • Modified Datasets: These datasets introduce various types of errors to showcase ODCT’s ability to handle diverse compliance scenarios:

      1. Modified Dataset 1: Introduces type mismatches and spelling errors in key attributes.
      2. Modified Dataset 2: Contains extraneous properties and missing required properties, including details about energy consumption and efficiency class.
      3. Modified Dataset 3: Includes both extraneous and missing properties, and additional priority levels for energy profiles.
    • SHACL Shapes: The SHACL shapes used in the compliance testing for both SAREF and SAREF4ENER ontologies are included in this repository to allow reproducibility of the validation process.

    • Error Detection Results and Performance Reports: After conducting compliance tests using ODCT we got the Results and Performance Reports, the repository includes comprehensive reports detailing the results. These reports highlight the types of errors detected and provide a performance analysis of the tool under various scenarios.

    • Demonstration Video: A video is provided to guide users through the ODCT web application, showcasing how the tool detects errors and generates detailed compliance reports based on smart energy appliance datasets.

    Background

    The integration of smart energy appliances into modern power grids is key to improving energy management and supporting sustainability goals like the European Green Deal. However, ensuring that these devices communicate effectively and conform to standardized protocols is a challenge. The ODCT tool addresses this challenge by providing a rigorous, ontology-based validation framework that is both protocol-agnostic and technology-flexible.

    This work is grounded in the broader context of global warming and the need for energy efficiency and demand-side flexibility in energy systems. By ensuring compliance with SAREF and SAREF4ENER, ODCT supports the EU’s ambitions for carbon neutrality by 2050, contributing to a connected, efficient, and sustainable energy ecosystem.

    Methodology

    ODCT uses a structured methodology that involves:

    1. Generating relevant datasets for validation.
    2. Defining SHACL shape constraints based on ontologies.
    3. Developing a user-friendly web application to facilitate compliance testing.
    4. Performing compliance tests that validate datasets against SHACL shapes, ensuring interoperability and adherence to energy management standards.

    Why It Matters

    Researchers and developers working on smart energy appliances will benefit from ODCT by:

    • Ensuring their devices meet standardized ontological requirements for interoperability.
    • Reducing compliance issues in the development phase, leading to smoother integration into energy management systems.
    • Supporting the sustainability efforts by enhancing device communication in smart grids.

    This repository showcases the potential of ODCT in fostering data accuracy, semantic interoperability, and compliance with essential energy standards. It offers comprehensive resources for furthering research and development in the field of smart energy appliances and energy management.

  13. Dataset for publication "Toward an Autonomous Robotic Battery Materials...

    • zenodo.org
    zip
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enea Svaluto-Ferro; Enea Svaluto-Ferro; Graham Kimbell; Graham Kimbell; YeonJu Kim; YeonJu Kim; Nukorn Plainpan; Nukorn Plainpan; Benjamin Kunz; Lina Scholz; Raphael Laeubli; Maximilian Becker; David Reber; David Reber; Peter Kraus; Peter Kraus; Ruben-Simon Kühnel; Ruben-Simon Kühnel; Corsin Battaglia; Corsin Battaglia; Benjamin Kunz; Lina Scholz; Raphael Laeubli; Maximilian Becker (2025). Dataset for publication "Toward an Autonomous Robotic Battery Materials Research Platform Powered by Automated Workflow and Ontologized Findable, Accessible, Interoperable, and Reusable Data Management" [Dataset]. http://doi.org/10.5281/zenodo.15481956
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Enea Svaluto-Ferro; Enea Svaluto-Ferro; Graham Kimbell; Graham Kimbell; YeonJu Kim; YeonJu Kim; Nukorn Plainpan; Nukorn Plainpan; Benjamin Kunz; Lina Scholz; Raphael Laeubli; Maximilian Becker; David Reber; David Reber; Peter Kraus; Peter Kraus; Ruben-Simon Kühnel; Ruben-Simon Kühnel; Corsin Battaglia; Corsin Battaglia; Benjamin Kunz; Lina Scholz; Raphael Laeubli; Maximilian Becker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Scope

    This dataset comprises detailed data from 199 coin cell batteries featuring either NMC//graphite or LFP//graphite chemistries cycled for 1000 cycles. All batteries were assembled and cycled using the automated robotic battery materials research platform, Aurora, at Empa, the Swiss Federal Laboratories for Materials Science and Technology, within the Laboratory for Materials for Energy Conversion.

    This dataset accompanies the publication:
    Toward an Autonomous Robotic Battery Materials Research Platform Powered by Automated Workflow and Ontologized Findable, Accessible, Interoperable, and Reusable Data Management.
    Batteries & Supercaps, 2025, https://doi.org/10.1002/batt.202500155

    Data Structure

    The dataset is packaged as a RO-Crate (Research Object Crate), facilitating standardized and machine-readable data sharing.

    For each battery cell, the following files are included:

    Battery cell metadata file:

    • Filename: empa_ccid000XXX.metadata.json

    • Description: Contains semantically annotated battery assembly data along with the cycling protocol applied to the specific battery cell. The file is structured in JSON-LD format and leverages the Battery Interface Ontology (BattINFO) domain to ensure semantic interoperability and rich data description.

    • Reference: Battery Interface Ontology — BattInfo documentation

    Battery cell cycling data files:

    • Filenames:

      • empa_ccid000XXX.bdf.csv

      • empa_ccid000XXX.bdf.parquet

    • Description: Cycling data is provided in two formats: CSV and Parquet. Both contain identical data. The CSV format enables straightforward human inspection, while the Parquet format supports faster parsing and is optimized for automated processing. These files follow the Battery Data Format (BDF) for Time-Series Data, as recommended by the Battery Data Alliance.

    • Reference: Battery Data Alliance - Battery Data Format Definition

  14. f

    Data from: Glocal Clinical Registries: Pacemaker Registry Design and...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jul 25, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    de Moraes Albertini, Caio Marcos; Barros, Jacson V.; Costa, Roberto; Crevelari, Elizabeth Sartori; Santana, Jose Eduardo; Vissoci, João Ricardo Nickenig; Filho, Martino Martinelli; Pietrobon, Ricardo; da Silva, Kátia Regina; Lacerda, Marianna Sobral (2013). Glocal Clinical Registries: Pacemaker Registry Design and Implementation for Global and Local Integration – Methodology and Case Study [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001682648
    Explore at:
    Dataset updated
    Jul 25, 2013
    Authors
    de Moraes Albertini, Caio Marcos; Barros, Jacson V.; Costa, Roberto; Crevelari, Elizabeth Sartori; Santana, Jose Eduardo; Vissoci, João Ricardo Nickenig; Filho, Martino Martinelli; Pietrobon, Ricardo; da Silva, Kátia Regina; Lacerda, Marianna Sobral
    Description

    BackgroundThe ability to apply standard and interoperable solutions for implementing and managing medical registries as well as aggregate, reproduce, and access data sets from legacy formats and platforms to advanced standard formats and operating systems are crucial for both clinical healthcare and biomedical research settings.PurposeOur study describes a reproducible, highly scalable, standard framework for a device registry implementation addressing both local data quality components and global linking problems.Methods and ResultsWe developed a device registry framework involving the following steps: (1) Data standards definition and representation of the research workflow, (2) Development of electronic case report forms using REDCap (Research Electronic Data Capture), (3) Data collection according to the clinical research workflow and, (4) Data augmentation by enriching the registry database with local electronic health records, governmental database and linked open data collections, (5) Data quality control and (6) Data dissemination through the registry Web site. Our registry adopted all applicable standardized data elements proposed by American College Cardiology / American Heart Association Clinical Data Standards, as well as variables derived from cardiac devices randomized trials and Clinical Data Interchange Standards Consortium. Local interoperability was performed between REDCap and data derived from Electronic Health Record system. The original data set was also augmented by incorporating the reimbursed values paid by the Brazilian government during a hospitalization for pacemaker implantation. By linking our registry to the open data collection repository Linked Clinical Trials (LinkedCT) we found 130 clinical trials which are potentially correlated with our pacemaker registry.ConclusionThis study demonstrates how standard and reproducible solutions can be applied in the implementation of medical registries to constitute a re-usable framework. Such approach has the potential to facilitate data integration between healthcare and research settings, also being a useful framework to be used in other biomedical registries.

  15. H

    Data from: Air Traffic Data International Mobility Indicators for the UK

    • dataverse.harvard.edu
    • dataone.org
    Updated Feb 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LAURA POLLACCI; ALINA SIRBU; STEFANO MARIA IACUS (2023). Air Traffic Data International Mobility Indicators for the UK [Dataset]. http://doi.org/10.7910/DVN/AE1PKC
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 23, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    LAURA POLLACCI; ALINA SIRBU; STEFANO MARIA IACUS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    United Kingdom
    Description

    The Air Traffic Data International Mobility Indicators for the UK results from the investigation on air passenger data from the Sabre Corporation [1], accessed through a collaboration with the JRC Ispra. Starting from air passenger traffic volumes from each country of origin and to the final country of destination, two mobility indicators based on log flow ratios were provided: the Flow Log Ratio (FLR) and the Cumulative Flow Log Ratio (CFLR). These indicators, computed with monthly and yearly resolution, allow to eliminate short term trips observing the general pattern of longer-term mobility. The Flow Log Ratio (FLR) is defined as the logarithm of the ratio between the number of incoming individuals in a given country (e.g., entering the UK) and the number of outgoing individuals in the same observed country (e.g., leaving the UK). Specifically, for each country or set of countries of origin and destination (C1, C2), and over a specified period of time, t, we consider the incoming flow FI(t) (from C2 to C1) and the outgoing flow FO(t) (from C1 to C2). The Flow Log Ratio FLR(t) is then defined as log2(FI(t)/FO(t)). If the FLR is below 0, it means that more individuals moved out of C1 compared to those who moved in, while an index above 0 shows that C1 is an attractive country with more people coming in. An FLR of 1 means the incoming flows are twice as large as outgoing flows, while an FLR of -1 means the outgoing flows are twice less. The FLR is an indicator that allows to study the trends point by point in time and observe point-wise changes in trends. The Cumulative Flow Log Ratio (CFLR) is defined as the logarithm of the ratio between the cumulative incoming flows and cumulative outgoing flows up to the current time window t. Compared to the FLR, the CFLR allows to evaluate cumulative pattens over much longer periods, rather than performing a point-wise analysis. The indicators are provided for the UK versus the rest of the European Union. Further, we provide regional indicators using the division of EU member states into regions proposed by the EuroVoc vocabulary [2]: Northern (Finland, Denmark, Sweden, Estonia, Latvia, Lithuania), Southern (Greece, Italy, Malta, Portugal, Cyprus, Spain), Western (France, Germany, Ireland, Luxembourg, Netherlands, Austria, Belgium), Central and Eastern (Hungary, Poland, Romania, Bulgaria, Croatia, Slovakia, Czechia, Slovenia). Europe-level indicators are also included. The entire Air Traffic Data International Mobility Indicators for the UK includes monthly and yearly Flow Log Ratio and Cumulative Flow Log Ratio indicators calculated at different spatial and time resolutions. Further, the monthly set also provides the components obtained by applying Seasonal-Trend decomposition (TSD) [3] to FLR regional values. These allow for separating seasonal from overall patterns. The Air Traffic Data International Mobility Indicators for the UK include FLRs and CFLRs values calculated for the United Kingdom versus a) the 27 countries in the European Union, b) the four regions of the European Union, and c) the entire European Union. Monthly data are provided from February 2011 to October 2021, while yearly data covers 2011-2021. Moreover, the monthly dataset includes components, i.e., trend, seasonal, and residual signals, obtained by decomposing the regional EU FLRs with Statsmodels [4] Python library (using an additive model with 12 components). In publishing the dataset, we followed the DEU guidelines for publishing high-quality data. To ensure interoperability and facilitate automatic processing by machines, we used the CSV format with US-ASCII encoding. All country names follow the ISO2 standard. The European subregions follow the EuroVoc vocabulary, dates are standardised, time series are complete. The CSV files are accompanied by a README that defines all variables included in the data and cross-references publications. References: [1] Sabre. Market intelligence, global demand data. https://es.sonicurlprotection-fra.com/click?PV=2&MSGID=202302101437200109948&URLID=11&ESV=10.0.19.7431&IV=259BC11764855306985B70AF21AF9795&TT=1676039840964&ESN=Vs8xERNXlu7bOs3Tyb9f%2Fa8tNspLAa%2FGwagIu4vHdcQ%3D&KV=1536961729280&B64_ENCODED_URL=aHR0cHM6Ly93d3cuc2FicmUuY29tL3Byb2R1Y3RzL21hcmtldC1pbnRlbGxpZ2VuY2UvLA&HK=D2BCC95C29FB56BEC2A395CC3D9C17C53D482CA86C9C38AA591FB4CEC3FD597F 2021. Accessed: 2021-11-15. [2] https://es.sonicurlprotection-fra.com/click?PV=2&MSGID=202302101437200109948&URLID=10&ESV=10.0.19.7431&IV=2934525D891132A3AEF7FAE3284ABBF5&TT=1676039840964&ESN=1y%2BYp5gdrdyZM9uJx0B%2FPBEP1rDDsKvDHe7LgSX0cS8%3D&KV=1536961729280&B64_ENCODED_URL=aHR0cHM6Ly9ldXItbGV4LmV1cm9wYS5ldS9icm93c2UvZXVyb3ZvYy5odG1sP3BhcmFtcz03Miw3MjA2&HK=8C84248906662B84FF5949BF9C969AA3FE97AB3970282A47E9BDFA1EB8E0B1F6 [3] Cleveland, R.B., Cleveland, W.S., McRae, J.E. and Terpenning, I., 1990. STL: A seasonal-trend decomposition. J. Off. Stat, 6(1), pp.3-73. [4] McKinney, W., Perktold, J., & Seabold, S. (2011)....

  16. Australia's Land Borders

    • data.gov.au
    esri mapserver, html +2
    Updated Nov 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Commonwealth of Australia (Geoscience Australia) (2020). Australia's Land Borders [Dataset]. https://data.gov.au/dataset/ds-ga-859276f9-b266-4b44-bb3f-29afc591a9b0
    Explore at:
    pdf, html, esri mapserver, wmsAvailable download formats
    Dataset updated
    Nov 6, 2020
    Dataset provided by
    Geoscience Australiahttp://ga.gov.au/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Australia
    Description

    Australia's Land Borders is a product within the Foundation Spatial Data Framework (FSDF) suite of datasets. It is endorsed by the ANZLIC - the Spatial Information Council and the Intergovernmental …Show full descriptionAustralia's Land Borders is a product within the Foundation Spatial Data Framework (FSDF) suite of datasets. It is endorsed by the ANZLIC - the Spatial Information Council and the Intergovernmental Committee on Surveying and Mapping (ICSM) as a nationally consistent and topologically correct representation of the land borders published by the Australian states and territories. The purpose of this product is to provide: (i) a building block which enables development of other national datasets; (ii) integration with other geospatial frameworks in support of data analysis; and (iii) visualisation of these borders as cartographic depiction on a map. Although this dataset depicts land borders, it is not nor does it suggests to be a legal definition of these borders. Therefore it cannot and must not be used for those use-cases pertaining to legal context. This product is constructed by Geoscience Australia (GA), on behalf of the ICSM, from authoritative open data published by the land mapping agencies in their respective Australian state and territory jurisdictions. Construction of a nationally consistent dataset required harmonisation and mediation of data issues at abutting land borders. In order to make informed and consistent determinations, other datasets were used as visual aid in determining which elements of published jurisdictional data to promote into the national product. These datasets include, but are not restricted to: (i) PSMA Australia's commercial products such as the cadastral (property) boundaries (CadLite) and Geocoded National Address File (GNAF); (ii) Esri's World Imagery and Imagery with Labels base maps; and (iii) Geoscience Australia's GEODATA TOPO 250K Series 3. Where practical, Land Borders do not cross cadastral boundaries and are logically consistent with addressing data in GNAF. It is important to reaffirm that although third-party commercial datasets are used for validation, which is within remit of the licence agreement between PSMA and GA, no commercially licenced data has been promoted into the product. Australian Land Borders are constructed exclusively from published open data originating from state, territory and federal agencies. This foundation dataset consists of edges (polylines) representing mediated segments of state and/or territory borders, connected at the nodes and terminated at the coastline defined as the Mean High Water Mark (MHWM) tidal boundary. These polylines are attributed to convey information about provenance of the source. It is envisaged that land borders will be topologically interoperable with the future national coastline dataset/s, currently being built through the ICSM coastline capture collaboration program. Topological interoperability will enable closure of land mass polygon, permitting spatial analysis operations such as vector overly, intersect, or raster map algebra. In addition to polylines, the product incorporates a number of well-known survey-monumented corners which have historical and cultural significance associated with the place name. This foundation dataset is constructed from the best-available data, as published by relevant custodian in state and territory jurisdiction. It should be noted that some custodians - in particular the Northern Territory and New South Wales - have opted out or to rely on data from abutting jurisdiction as an agreed portrayal of their border. Accuracy and precision of land borders as depicted by spatial objects (features) may vary according to custodian specifications, although there is topological coherence across all the objects within this integrated product. The guaranteed minimum nominal scale for all use-cases, applying to complete spatial coverage of this product, is 1:25 000. In some areas the accuracy is much better and maybe approaching cadastre survey specification, however, this is an artefact of data assembly from disparate sources, rather than the product design. As the principle, no data was generalised or spatially degraded in the process of constructing this product. Some use-cases for this product are: general digital and web map-making applications; a reference dataset to use for cartographic generalisation for a smaller-scale map applications; constraining geometric objects for revision and updates to the Mesh Blocks, the building blocks for the larger regions of the Australian Statistical Geography Standard (ASGS) framework; rapid resolution of cross-border data issues to enable construction and visual display of a common operating picture, etc. This foundation dataset will be maintained at irregular intervals, for example if a state or territory jurisdiction decides to publish or republish their land borders. If there is a new version of this dataset, past version will be archived and information about the changes will be made available in the change log.

  17. c

    ckanext-bcgov-schema

    • catalog.civicdataecosystem.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ckanext-bcgov-schema [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-bcgov-schema
    Explore at:
    Dataset updated
    Jun 4, 2025
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The bcgov-schema extension provides custom schemas specifically for use within the BC Data Catalogue. These schemas define the structure and validation rules for metadata associated with datasets and other resources within the CKAN instance, ensuring data consistency and quality across the platform. It enhances the data governance capabilities of CKAN, supporting a standardized approach to metadata management. Key Features: Custom Schema Definitions: Provides pre-defined schemas tailored to the BC Data Catalogue, facilitating standardized metadata entry and validation. Consistent Data Governance: Ensures data consistency and quality by enforcing structured metadata requirements across the platform. Integration with Other Extensions: Designed to work in conjunction with ckanext-scheming, ckanext-repeating and ckanext-composite. This allows developers to combine this extension's schemas with other extensions for even more functionality. Technical Integration: Based on the Readme, the bcgov-schema extension appears to integrate with CKAN through the use of other extensions which are ckanext-scheming, ckanext-repeating and ckanext-composite. The existing extensions provide means to use schemas to validate metadata and handle repeating fields. Benefits & Impact: By implementing the bcgov-schema extension, organizations can enforce standardized metadata practices, which enhances discoverability, improves data quality, and promotes interoperability. The schemas pre-packaged within this extension allow for consistency across the BC Data Catalogue.

  18. Electronic Medical Records Systems in the US - Market Research Report...

    • ibisworld.com
    Updated Sep 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IBISWorld (2025). Electronic Medical Records Systems in the US - Market Research Report (2015-2030) [Dataset]. https://www.ibisworld.com/united-states/market-research-reports/electronic-medical-records-systems-industry/
    Explore at:
    Dataset updated
    Sep 9, 2025
    Dataset authored and provided by
    IBISWorld
    License

    https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/

    Time period covered
    2015 - 2030
    Description

    Electronic medical and electronic health records vendors (EMR/EHR vendors) provide services while regulatory requirements and persistent technological advancement impact their bottom line and their clients. Federal mandates like the HITECH have boosted adoption rates, making digital recordkeeping ubiquitous. As clients consolidate, so do EMR/HHR providers. The trend toward consolidation has defined much of the last decade, with two companies, Epic Systems Corporation and Oracle, controlling roughly half of the US market, presenting significant barriers to entry for new vendors. Interestingly, the industry has seen a marginal drop in overall revenue, partly resulting from healthcare organizations negotiating lower licensing fees and transitioning to more cost-effective cloud-based systems; nonetheless, profit have climbed. Efficiency gains from large-scale client portfolios, high switching costs and consolidation boost operational leverage for providers. Industry revenue has declined at a CAGR of 0.3% to reach $19.4 billion in 2025, with revenue growing 3.6% in 2025 alone and profit continuing to trend upwards. EMR/EHR platforms embrace advanced technologies (artificial intelligence and wearable integration). The explosion of data from devices like smartwatches, sensors and continuous glucose monitors is reshaping patient management and supporting the shift toward personalized, holistic care. EHRs now aggregate this real-time health data, granting clinicians and patients actionable insight into chronic and acute conditions. As wearables proliferate and consumers and healthcare professionals call for seamless data integration, EHR and EMR systems with AI will remain central to connected care delivery. The market is forecast to strengthen at a CAGR of 4.0% to reach $23.6 billion by 2030, with profit continuing upward. Consolidation and increased concentration provide economies of scale, allowing dominant vendors to spread costs, innovate and improve profit. Switching to a different provider is extremely challenging after a healthcare organization implements an EMR system because of the considerable expenses and complexities associated with migrating data and integrating new systems. These hurdles lock in vendors, resulting in persistent concentration. However, competitive pressure among the large incumbents and niche providers leads to competitive pricing battles and slower profit growth. The push to innovate from healthcare providers will be strong and supported by regulatory actions that require enhanced interoperability and data privacy. Overall, performance hinges on the healthcare industry's financial stability and the benefits of updating and expanding EMR/EHR systems.

  19. f

    Table_1_The Locare workflow: representing neuroscience data locations as...

    • figshare.com
    • frontiersin.figshare.com
    xlsx
    Updated Feb 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Camilla H. Blixhavn; Ingrid Reiten; Heidi Kleven; Martin Øvsthus; Sharon C. Yates; Ulrike Schlegel; Maja A. Puchades; Oliver Schmid; Jan G. Bjaalie; Ingvild E. Bjerke; Trygve B. Leergaard (2024). Table_1_The Locare workflow: representing neuroscience data locations as geometric objects in 3D brain atlases.xlsx [Dataset]. http://doi.org/10.3389/fninf.2024.1284107.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    Frontiers
    Authors
    Camilla H. Blixhavn; Ingrid Reiten; Heidi Kleven; Martin Øvsthus; Sharon C. Yates; Ulrike Schlegel; Maja A. Puchades; Oliver Schmid; Jan G. Bjaalie; Ingvild E. Bjerke; Trygve B. Leergaard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Neuroscientists employ a range of methods and generate increasing amounts of data describing brain structure and function. The anatomical locations from which observations or measurements originate represent a common context for data interpretation, and a starting point for identifying data of interest. However, the multimodality and abundance of brain data pose a challenge for efforts to organize, integrate, and analyze data based on anatomical locations. While structured metadata allow faceted data queries, different types of data are not easily represented in a standardized and machine-readable way that allow comparison, analysis, and queries related to anatomical relevance. To this end, three-dimensional (3D) digital brain atlases provide frameworks in which disparate multimodal and multilevel neuroscience data can be spatially represented. We propose to represent the locations of different neuroscience data as geometric objects in 3D brain atlases. Such geometric objects can be specified in a standardized file format and stored as location metadata for use with different computational tools. We here present the Locare workflow developed for defining the anatomical location of data elements from rodent brains as geometric objects. We demonstrate how the workflow can be used to define geometric objects representing multimodal and multilevel experimental neuroscience in rat or mouse brain atlases. We further propose a collection of JSON schemas (LocareJSON) for specifying geometric objects by atlas coordinates, suitable as a starting point for co-visualization of different data in an anatomical context and for enabling spatial data queries.

  20. Definition of concept coverage scores for ASSESS CT manual annotation.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jose Antonio Miñarro-Giménez; Catalina Martínez-Costa; Daniel Karlsson; Stefan Schulz; Kirstine Rosenbeck Gøeg (2023). Definition of concept coverage scores for ASSESS CT manual annotation. [Dataset]. http://doi.org/10.1371/journal.pone.0209547.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jose Antonio Miñarro-Giménez; Catalina Martínez-Costa; Daniel Karlsson; Stefan Schulz; Kirstine Rosenbeck Gøeg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Definition of concept coverage scores for ASSESS CT manual annotation.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Tuncay Namli; Ali Anıl Sınacı; Suat Gönül; Cristina Ruiz Herguido; Patricia Garcia-Canadilla; Adriana Modrego Muñoz; Arnau Valls Esteve; Gökçe Banu Laleci Ertürkmen (2024). Table_1_A scalable and transparent data pipeline for AI-enabled health data ecosystems.docx [Dataset]. http://doi.org/10.3389/fmed.2024.1393123.s001

Table_1_A scalable and transparent data pipeline for AI-enabled health data ecosystems.docx

Related Article
Explore at:
docxAvailable download formats
Dataset updated
Jul 30, 2024
Dataset provided by
Frontiers
Authors
Tuncay Namli; Ali Anıl Sınacı; Suat Gönül; Cristina Ruiz Herguido; Patricia Garcia-Canadilla; Adriana Modrego Muñoz; Arnau Valls Esteve; Gökçe Banu Laleci Ertürkmen
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

IntroductionTransparency and traceability are essential for establishing trustworthy artificial intelligence (AI). The lack of transparency in the data preparation process is a significant obstacle in developing reliable AI systems which can lead to issues related to reproducibility, debugging AI models, bias and fairness, and compliance and regulation. We introduce a formal data preparation pipeline specification to improve upon the manual and error-prone data extraction processes used in AI and data analytics applications, with a focus on traceability.MethodsWe propose a declarative language to define the extraction of AI-ready datasets from health data adhering to a common data model, particularly those conforming to HL7 Fast Healthcare Interoperability Resources (FHIR). We utilize the FHIR profiling to develop a common data model tailored to an AI use case to enable the explicit declaration of the needed information such as phenotype and AI feature definitions. In our pipeline model, we convert complex, high-dimensional electronic health records data represented with irregular time series sampling to a flat structure by defining a target population, feature groups and final datasets. Our design considers the requirements of various AI use cases from different projects which lead to implementation of many feature types exhibiting intricate temporal relations.ResultsWe implement a scalable and high-performant feature repository to execute the data preparation pipeline definitions. This software not only ensures reliable, fault-tolerant distributed processing to produce AI-ready datasets and their metadata including many statistics alongside, but also serve as a pluggable component of a decision support application based on a trained AI model during online prediction to automatically prepare feature values of individual entities. We deployed and tested the proposed methodology and the implementation in three different research projects. We present the developed FHIR profiles as a common data model, feature group definitions and feature definitions within a data preparation pipeline while training an AI model for “predicting complications after cardiac surgeries”.DiscussionThrough the implementation across various pilot use cases, it has been demonstrated that our framework possesses the necessary breadth and flexibility to define a diverse array of features, each tailored to specific temporal and contextual criteria.

Search
Clear search
Close search
Google apps
Main menu