https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
This dataset provides reference ontologies that were translated from product design and inspection data from the National Institute of Standards and Technology (NIST) Smart Manufacturing Systems (SMS) Test Bed. The examples represents a three-component assembly of a box, machined from Aluminum, and has a technical data package available on the SMS Test Bed website. The use of the ontologies aims to integrate the product lifecycle data of engineering design represented in the STEP AP242 format, which is described in the ISO 10303 series, as well as quality assurance data, representing in the Quality Information Framework (QIF) standard.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The increasing scale and diversity of seismic data, and the growing role of big data in seismology, has raised interest in methods to make data exploration more accessible. This paper presents the use of knowledge graphs (KGs) for representing seismic data and metadata to improve data exploration and analysis, focusing on usability, flexibility, and extensibility. Using constraints derived from domain knowledge in seismology, we define semantic models of seismic station and event information used to construct the KGs. Our approach utilizes the capability of KGs to integrate data across many sources and diverse schema formats. We use schema-diverse, real-world seismic data to construct KGs with millions of nodes, and illustrate potential applications with three big-data examples. Our findings demonstrate the potential of KGs to enhance the efficiency and efficacy of seismological workflows in research and beyond, indicating a promising interdisciplinary future for this technology. Methods The data here consists of, and was collected from:
Station metadata, in StationXML format, acquired from IRIS DMC using the fdsnws-station webservice (https://service.iris.edu/fdsnws/station/1/). Earthquake event data, in NDK format, acquired from the Global Centroid-Moment Tensor (GCMT) catalog webservice (https://www.globalcmt.org) [1,2]. Earthquake event data, in CSV format, acquired from the USGS earthquake catalog webservice (https://doi.org/10.5066/F7MS3QZH) [3].
The format of the data is described in the README. In addition, a complete description of the StationXML, NDK, and USGS file formats can be found at https://www.fdsn.org/xml/station/, https://www.ldeo.columbia.edu/~gcmt/projects/CMT/catalog/allorder.ndk_explained, and https://earthquake.usgs.gov/data/comcat/#event-terms, respectively. Also provided are conversions from NDK and StationXML file formats into JSON format. References: [1] Dziewonski, A. M., Chou, T. A., & Woodhouse, J. H. (1981). Determination of earthquake source parameters from waveform data for studies of global and regional seismicity. Journal of Geophysical Research: Solid Earth, 86(B4), 2825-2852. [2] Ekström, G., Nettles, M., & Dziewoński, A. M. (2012). The global CMT project 2004–2010: Centroid-moment tensors for 13,017 earthquakes. Physics of the Earth and Planetary Interiors, 200, 1-9. [3] U.S. Geological Survey, Earthquake Hazards Program, 2017, Advanced National Seismic System (ANSS) Comprehensive Catalog of Earthquake Events and Products: Various, https://doi.org/10.5066/F7MS3QZH.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
IBM is providing free access to its COVID-19 Knowledge Graph integrating COVID-19 data from various sources: CORD-19 (https://www.semanticscholar.org/cord19) for literature, Clinicaltrials.gov (https://clinicaltrials.gov/) and WHO ICTRP (https://www.who.int/ictrp/search) for trials, DrugBank (https://www.drugbank.ca/) and GenBank (https://www.ncbi.nlm.nih.gov/genbank) for database data. Prepared search reports at the Reports Page are available on open access. However, to access the COVID-19 Knowledge Graph, it is necessary to request access.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This schema defines a metadata model specifically for dataset-graph datasets with provenance tracking capabilities. It captures essential publication metadata including creators, versioning, licensing, and distribution information. The schema is in full compliance with DCAT (Data Catalog Vocabulary) standards and provenance tracking through PROV-O ontology integration. This schema is described in detail in the HRA KG paper.
Bibliography:
These data were used to examine grammatical structures and patterns within a set of geospatial glossary definitions. Objectives of our study were to analyze the semantic structure of input definitions, use this information to build triple structures of RDF graph data, upload our lexicon to a knowledge graph software, and perform SPARQL queries on the data. Upon completion of this study, SPARQL queries were proven to effectively convey graph triples which displayed semantic significance. These data represent and characterize the lexicon of our input text which are used to form graph triples. These data were collected in 2024 by passing text through multiple Python programs utilizing spaCy (a natural language processing library) and its pre-trained English transformer pipeline. Before data was processed by the Python programs, input definitions were first rewritten as natural language and formatted as tabular data. Passages were then tokenized and characterized by their part-of-speech, tag, dependency relation, dependency head, and lemma. Each word within the lexicon was tokenized. A stop-words list was utilized only to remove punctuation and symbols from the text, excluding hyphenated words (ex. bowl-shaped) which remained as such. The tokens’ lemmas were then aggregated and totaled to find their recurrences within the lexicon. This procedure was repeated for tokenizing noun chunks using the same glossary definitions.
To address the increasing complexity of network management and the limitations of data repositories in handling the various network operational data, this paper proposes a novel repository design that uniformly represents network operational data while allowing for a multiple abstractions access to the information. This smart repository simplifies network management functions by enabling network verification directly within the repository. The data is organized in a knowledge graph compatible with any general-purpose graph database, offering a comprehensive and extensible network repository. Performance evaluations confirm the feasibility of the proposed design. The repository's ability to natively support 'what-if' scenario evaluation is demonstrated by verifying Border Gateway Protocol (BGP) route policies and analyzing forwarding behavior with virtual Traceroute.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This schema defines a minimal data model for graph-based data structures, providing a standardized container with metadata but without specifying the actual graph content format. It serves as a flexible framework for publishing various types of graph data (knowledge graphs, network data, relationship structures) while ensuring consistent metadata documentation including provenance, versioning, and licensing information. This schema is described in detail in the HRA KG paper.
Bibliography:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data repository contains the output files from the analysis of the paper "Supporting Online Toxicity Detection with Knowledge Graphs" presented at the International Conference on Web and Social Media 2022 (ICWSM-2022).
The data contains annotations of gender and sexual orientation entities provided by the Gender and Sexual Orientation Ontology (https://bioportal.bioontology.org/ontologies/GSSO).
We analyse demographic group samples from the Civil Comments Identities dataset (https://www.tensorflow.org/datasets/catalog/civil_comments).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This schema defines a metadata model specifically for 3D reference organ datasets with provenance tracking capabilities. It captures essential publication metadata including creators, versioning, licensing, and distribution information. The schema is in full compliance with DCAT (Data Catalog Vocabulary) standards and provenance tracking through PROV-O ontology integration. This schema is described in detail in the HRA KG paper.
Bibliography:
The BBC Land Girls TV series is a 3 season series. Each season is 5 episodes of about 45mins each. The TRECVID group at NIST worked with the BBC Corp. to release the dataset to the research community to work on video understanding tasks. Unfortunately, the hosting arrangement for the dataset was not successful and the release of the video dataset couldn't be done. We are releasing the annotations conducted by NIST, without any video data, so that the researchers interested in working on knowledge graph understanding and natural language analysis can take advantage of them.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This schema defines a metadata model specifically for vocabulary or terminology resources with provenance tracking capabilities. It captures essential publication metadata including creators, versioning, licensing, and distribution information. The schema is in full compliance with DCAT (Data Catalog Vocabulary) standards and provenance tracking through PROV-O ontology integration. This schema is described in detail in the HRA KG paper.
Bibliography:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Webis-ArgKB-20 is a new corpus that comprises about 16k manual annotations of 4740 claims in accordance with a newly proposed model for an argumentation knowledge graph.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the entities related to the EOSC nodes and associated data sources registered on the EOSC -Beyond catalogue.
Data is extracted from the OpenAIRE graph visible from https://explore.openaire.eu/ in June 2025.
EOSC nodes available: node-cessda
Registered datasources: CESSDA (21.15124/2shDkg)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Webis-ArgKB-20 is a new corpus that comprises about 16k manual annotations of 4740 claims in accordance with a newly proposed model for an argumentation knowledge graph.
Service Description B2FIND serves as EUDAT’s metadata indexing service, offering a user-friendly discovery portal designed to assist researchers in locating data collections spanning international and interdisciplinary domains. This service is built upon a comprehensive metadata catalog containing research data collections stored across EUDAT data centers and community repositories. By harmonizing metadata descriptions gathered from diverse sources, B2FIND not only ensures consistency in presentation but also facilitates faceted searches that transcend scientific disciplines. It caters to both communities and data providers seeking to publish and enhance the visibility of their metadata, as well as individual researchers looking for data resources across various domains. Features Harmonization of the metadata descriptions via the EUDAT Core metadata schema Repository harvesting through various protocols (e.g., OAI-PMH, CSW, Rest-APIs) Faceted search with 17 facets, including geospatial and temporal search options, supplemented by free text search capabilities Metadata aggregation from community repositories, with support for multiple metadata standards Integration with OpenAIRE Knowledge Graph
The rdfstoreimporter extension for CKAN facilitates the synchronization of CKAN datasets with external RDF (Resource Description Framework) stores, such as Virtuoso. This synchronization empowers users to link CKAN's data management capabilities with the structured data environment provided by RDF stores. The extension enhances CKAN's ability to work seamlessly with semantic web technologies, providing a bridge between traditional data catalogs and linked data repositories. Key Features: RDF Store Synchronization: Allows automated synchronization of CKAN datasets with an external RDF store, which enables consistent data representation and availability across different platforms. Virtuoso Compatibility: Specifically mentions compatibility with Virtuoso, a popular RDF store, ensuring users can integrate CKAN with a widely used semantic data management system. Command-Line Interface (CLI) Execution: Provides a command-line interface for triggering the RDF store synchronization process, offering flexibility and control in managing the synchronization tasks. Technical Integration: The rdfstoreimporter extension integrates with CKAN by extending its core functionalities through the addition of a plugin. To enable the extension, users must modify the CKAN configuration file (production.ini) by adding rdfstoreimporter to the ckan.plugins setting. After modifying the configuration, a CKAN restart is required to activate the extension to ensure proper functionality. Benefits & Impact: By implementing the rdfstoreimporter extension, CKAN installations can benefit from enhanced data interoperability and semantic enrichment. Synchronizing datasets with RDF stores makes it easier to describe, link, and query data using semantic web standards. This can lead to: Improved Data Discoverability: Representing CKAN datasets in RDF format enhances their discoverability by semantic web crawlers and search engines. Enhanced Data Integration: Linking CKAN datasets to external RDF knowledge graphs can facilitate easier integration of data from different sources. Facilitated Semantic Analysis: Storing CKAN data in RDF stores enables sophisticated semantic analysis and reasoning, leading to new insights and knowledge discovery.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This schema defines a metadata model specifically for anatomical landmark datasets with provenance tracking capabilities. It captures essential publication metadata including creators, versioning, licensing, and distribution information. The schema is in full compliance with DCAT (Data Catalog Vocabulary) standards and provenance tracking through PROV-O ontology integration. This schema is described in detail in the HRA KG paper.
Bibliography:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
graph Digital Objects contain externally created Resource Description Framework (RDF, https://www.w3.org/RDF) graph data that are useful for Human Reference Atlas use cases. This graph curates enrichments from all public dataset graphs for the Human Reference Atlas. More information is presented in a related paper (Bueckle et al. 2025).
Bibliography:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This schema defines a metadata model specifically for HRApop datasets with provenance tracking capabilities. It captures essential publication metadata including creators, versioning, licensing, and distribution information. The schema is in full compliance with DCAT (Data Catalog Vocabulary) standards and provenance tracking through PROV-O ontology integration. This schema is described in detail in the HRA KG paper.
Bibliography:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This schema defines a data model for dataset graphs that integrates experimental metadata by linking donors, tissue samples, datasets, and their spatial positioning information. It provides a unified structure for representing the relationships between biological specimens (donors and tissue blocks), experimental datasets, and their corresponding spatial entities and placements within a coordinate system. This schema is described in detail in the HRA KG paper.
Bibliography:
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
This dataset provides reference ontologies that were translated from product design and inspection data from the National Institute of Standards and Technology (NIST) Smart Manufacturing Systems (SMS) Test Bed. The examples represents a three-component assembly of a box, machined from Aluminum, and has a technical data package available on the SMS Test Bed website. The use of the ontologies aims to integrate the product lifecycle data of engineering design represented in the STEP AP242 format, which is described in the ISO 10303 series, as well as quality assurance data, representing in the Quality Information Framework (QIF) standard.