37 datasets found

Disease or Syndrome Concepts and Types
johnsnowlabs.com
csv
Updated Jan 20, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Snow Labs (2021). Disease or Syndrome Concepts and Types [Dataset]. https://www.johnsnowlabs.com/marketplace/disease-or-syndrome-concepts-and-types/
Explore at:
csvAvailable download formats
Dataset updated
Jan 20, 2021
Dataset authored and provided by
John Snow Labs
Area covered
N/A
Description
This dataset contains the entire concept structure of UMLS Metathesaurus for the semantic type "Disease or Syndrome". One of the primary purposes of this dataset is to connect different names for all the concepts for a specific Semantic Type. There are 125 semantic types in the Semantic Network. Every Metathesaurus concept is assigned at least one semantic type; very few terms are assigned as many as five semantic types.
Aerial Semantic Segmentation Drone Dataset
kaggle.com
Updated Jan 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bulent Siyah (2021). Aerial Semantic Segmentation Drone Dataset [Dataset]. https://www.kaggle.com/datasets/bulentsiyah/semantic-drone-dataset/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 10, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bulent Siyah
Description
Dataset Resource: https://www.tugraz.at/index.php?id=22387

Citation If you use this dataset in your research, please cite the following URL:

http://dronedataset.icg.tugraz.at

License The Drone Dataset is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree:

That the dataset comes "AS IS", without express or implied warranty. Although every effort has been made to ensure accuracy, we (Graz University of Technology) do not accept any responsibility for errors or omissions. That you include a reference to the Semantic Drone Dataset in any work that makes use of the dataset. For research papers or other media link to the Semantic Drone Dataset webpage. That you do not distribute this dataset or modified versions. It is permissible to distribute derivative works in as far as they are abstract representations of this dataset (such as models trained on it or additional annotations that do not directly include any of our data) and do not allow to recover the dataset or something similar in character. That you may not use the dataset or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain. That all rights not expressly granted to you are reserved by us (Graz University of Technology).

Dataset Overview The Semantic Drone Dataset focuses on semantic understanding of urban scenes for increasing the safety of autonomous drone flight and landing procedures. The imagery depicts more than 20 houses from nadir (bird's eye) view acquired at an altitude of 5 to 30 meters above ground. A high resolution camera was used to acquire images at a size of 6000x4000px (24Mpx). The training set contains 400 publicly available images and the test set is made up of 200 private images.

PERSON DETECTION For the task of person detection the dataset contains bounding box annotations of the training and test set.

SEMANTIC SEGMENTATION We prepared pixel-accurate annotation for the same training and test set. The complexity of the dataset is limited to 20 classes as listed in the following table.

Table 1: Semanic classes of the Drone Dataset

tree, gras, other vegetation, dirt, gravel, rocks, water, paved area, pool, person, dog, car, bicycle, roof, wall, fence, fence-pole, window, door, obstacle
I
Intelligent Semantic Data Service Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Intelligent Semantic Data Service Report [Dataset]. https://www.marketreportanalytics.com/reports/intelligent-semantic-data-service-54001
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Apr 2, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Variables measured
Market Size
Description
The Intelligent Semantic Data Service (ISDS) market is experiencing robust growth, driven by the increasing need for businesses to derive actionable insights from complex and unstructured data. The market, estimated at $15 billion in 2025, is projected to expand at a Compound Annual Growth Rate (CAGR) of 20% from 2025 to 2033, reaching an estimated $70 billion by 2033. This growth is fueled by several key factors. Firstly, the rise of big data and the limitations of traditional data processing techniques are pushing organizations toward sophisticated solutions like ISDS to unlock the true potential of their data assets. Secondly, advancements in artificial intelligence (AI), natural language processing (NLP), and machine learning (ML) are enhancing the capabilities of ISDS, enabling more accurate and insightful data analysis. Thirdly, cloud-based deployments of ISDS are gaining significant traction, offering scalability, cost-effectiveness, and accessibility to a wider range of users. The enterprise segment currently dominates the market, driven by the need for improved operational efficiency, better decision-making, and enhanced customer experience. However, the personal segment is expected to witness faster growth due to increasing consumer adoption of AI-powered applications and smart devices. The competitive landscape is highly dynamic, with major technology companies like Google, IBM, Microsoft, Amazon, and Salesforce vying for market share. OpenAI, Alibaba, and Tencent are also making significant strides in the development and deployment of advanced ISDS solutions. North America currently holds the largest market share, fueled by early adoption and high technology investment. However, Asia-Pacific is expected to demonstrate the fastest growth, driven by rapid digital transformation in regions like China and India. Despite the significant opportunities, certain restraints remain. These include the high initial investment costs associated with ISDS implementation, the need for skilled professionals to manage and interpret the generated insights, and concerns related to data privacy and security. The market is further segmented by deployment type (cloud-based and on-premises) and application (enterprise and personal), reflecting the diverse needs and preferences of different user segments. Addressing these challenges will be crucial for continued market expansion and broader adoption of ISDS.
tFood: Semantic Table Annotations Benchmark for Food Domain
zenodo.org
zip
Updated Dec 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ernesto Jimènez-Ruiz; Ernesto Jimènez-Ruiz; Oktie Hassanzadeh; Oktie Hassanzadeh; Birgitta König-Ries; Birgitta König-Ries (2023). tFood: Semantic Table Annotations Benchmark for Food Domain [Dataset]. http://doi.org/10.5281/zenodo.10048187
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10048187
Dataset updated
Dec 7, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ernesto Jimènez-Ruiz; Ernesto Jimènez-Ruiz; Oktie Hassanzadeh; Oktie Hassanzadeh; Birgitta König-Ries; Birgitta König-Ries
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
tFood is a dataset for tabular data to knowledge graph matching. It is derived for the Food domain and has two types of tables. On the one hand, Horizontal Relational Tables are where each table represents a collection of entities. On the other hand, Entity Tables are where each of which represents a single entity. We supported ground truth data from Wikidata as a target knowledge graph (KG).

The supported tasks for semantic table annotations are:

Topic Detection (TD) links the entire table to an entity or a class from the target KG.

Cell Entity Annotation (CEA) maps individual table cells to entities from the target KG.

Column Type Annotation (CTA) links individual table columns to classes from the target KG.

Column Property Annotation (CPA) detects the relations between column pairs from the target knowledge graph.

This dataset version will be used during SemTab 2023 - Round 1. So, the ground truth data for the test set is currently hidden. We will add such ground truth after the conclusion of the challenge.
tBiomed: Semantic Table Annotations Benchmark for Biomedical Domain
zenodo.org
data.niaid.nih.gov
Updated Apr 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nora Abdelmageed; Nora Abdelmageed; Ernesto Jimènez-Ruiz; Ernesto Jimènez-Ruiz; Oktie Hassanzadeh; Oktie Hassanzadeh; Birgitta König-Ries; Birgitta König-Ries (2024). tBiomed: Semantic Table Annotations Benchmark for Biomedical Domain [Dataset]. http://doi.org/10.5281/zenodo.10996334
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.10996334
Dataset updated
Apr 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nora Abdelmageed; Nora Abdelmageed; Ernesto Jimènez-Ruiz; Ernesto Jimènez-Ruiz; Oktie Hassanzadeh; Oktie Hassanzadeh; Birgitta König-Ries; Birgitta König-Ries
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
tBiomed is a dataset for tabular data to knowledge graph matching. It is derived for the Biodiversity domain and has two types of tables. On the one hand, Horizontal Relational Tables are where each table represents a collection of entities. On the other hand, Entity Tables represent a single entity. We supported ground truth data from Wikidata as a target knowledge graph (KG).

tBiomed is generated by KG2Tables using two levels of a recursive hierarchy of related concepts in Wikidata.

tBiomed contains 26,778 entity and horizontal tables, while this repository contains only a validation fold of the original data representing 20% of the total of the entire benchmark with its ground truth data (gt). The Full size of this dataset is 1 GB.

We included the full version of the dataset. We will update this repository ground truth data of the test set in the Future.

The supported tasks for semantic table annotations are:

Topic Detection (TD) links the entire table to an entity or a class from the target KG.

Cell Entity Annotation (CEA) maps individual table cells to entities from the target KG.

Column Type Annotation (CTA) links individual table columns to classes from the target KG.

Column Property Annotation (CPA) detects the relations between column pairs from the target knowledge graph.

Row Annotation (RA) annotates the entire row to a KG entity or property.
Semantic Knowledge Graphing Market is Growing at a CAGR of 14.80% from 2024...
cognitivemarketresearch.com
pdf,excel,csv,ppt
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research, Semantic Knowledge Graphing Market is Growing at a CAGR of 14.80% from 2024 to 2031. [Dataset]. https://www.cognitivemarketresearch.com/semantic-knowledge-graphing-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
According to Cognitive Market Research, the global semantic knowledge graphing market size is USD 1512.2 million in 2024 and will expand at a compound annual growth rate (CAGR) of 14.80% from 2024 to 2031.

North America held the major market of around 40% of the global revenue with a market size of USD 604.88 million in 2024 and will grow at a compound annual growth rate (CAGR) of 13.0% from 2024 to 2031. Europe accounted for a share of over 30% of the global market size of USD 453.66 million. Asia Pacific held the market of around 23% of the global revenue with a market size of USD 347.81 million in 2024 and will grow at a compound annual growth rate (CAGR) of 16.8% from 2024 to 2031. Latin America market of around 5% of the global revenue with a market size of USD 75.61 million in 2024 and will grow at a compound annual growth rate (CAGR) of 14.2% from 2024 to 2031. Middle East and Africa held the major market of around 2% of the global revenue with a market size of USD 30.24 million in 2024 and will grow at a compound annual growth rate (CAGR) of 14.5% from 2024 to 2031. The natural language processing knowledge graphing held the highest growth rate in semantic knowledge graphing market in 2024.

Market Dynamics of Semantic Knowledge Graphing Market

Key Drivers of Semantic Knowledge Graphing Market

Growing Volumes of Structured, Semi-structured, and Unstructured Data to Increase the Global Demand

The global demand for semantic knowledge graphing is escalating in response to the exponential growth of structured, semi-structured, and unstructured data. Enterprises are inundated with vast amounts of data from diverse sources such as social media, IoT devices, and enterprise applications. Structured data from databases, semi-structured data like XML and JSON, and unstructured data from documents, emails, and multimedia files present significant challenges in terms of organization, analysis, and deriving actionable insights. Semantic knowledge graphing addresses these challenges by providing a unified framework for representing, integrating, and analyzing disparate data types. By leveraging semantic technologies, businesses can unlock the value hidden within their data, enabling advanced analytics, natural language processing, and knowledge discovery. As organizations increasingly recognize the importance of harnessing data for strategic decision-making, the demand for semantic knowledge graphing solutions continues to surge globally.

Demand for Contextual Insights to Propel the Growth

The burgeoning demand for contextual insights is propelling the growth of semantic knowledge graphing solutions. In today's data-driven landscape, businesses are striving to extract deeper contextual meaning from their vast datasets to gain a competitive edge. Semantic knowledge graphing enables organizations to connect disparate data points, understand relationships, and derive valuable insights within the appropriate context. This contextual understanding is crucial for various applications such as personalized recommendations, predictive analytics, and targeted marketing campaigns. By leveraging semantic technologies, companies can not only enhance decision-making processes but also improve customer experiences and operational efficiency. As industries across sectors increasingly recognize the importance of contextual insights in driving innovation and business success, the adoption of semantic knowledge graphing solutions is poised to witness significant growth. This trend underscores the pivotal role of semantic technologies in unlocking the true potential of data for strategic advantage in today's dynamic marketplace.

Restraint Factors Of Semantic Knowledge Graphing Market

Stringent Data Privacy Regulations to Hinder the Market Growth

Stringent data privacy regulations present a significant hurdle to the growth of the Semantic Knowledge Graphing market. Regulations such as GDPR (General Data Protection Regulation) in Europe and CCPA (California Consumer Privacy Act) in the United States impose strict requirements on how organizations collect, store, process, and share personal data. Compliance with these regulations necessitates robust data protection measures, including anonymization, encryption, and access controls, which can complicate the implementation of semantic knowledge graphing systems. Moreover, concerns about data breach...
n
Semantic Segmentation of Crop Type in Ghana
cmr.earthdata.nasa.gov
Updated Oct 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Semantic Segmentation of Crop Type in Ghana [Dataset]. http://doi.org/10.34911/rdnt.ry138p
Explore at:
Unique identifier
https://doi.org/10.34911/rdnt.ry138p
Dataset updated
Oct 10, 2023
Time period covered
Jan 1, 2020 - Jan 1, 2023
Area covered

Description
Automatic, accurate crop type maps can provide unprecedented information for understanding food systems, especially in developing countries where ground surveys are infrequent. However, little work has applied existing methods to these data scarce environments, which also have unique challenges of irregularly shaped fields, frequent cloud coverage, small plots, and a severe lack of training data. To address this gap in the literature, we provide the first crop type semantic segmentation dataset of small holder farms, specifically in Ghana and South Sudan. We are also the first to utilize high resolution, high frequency satellite data in segmenting small holder farms.

The dataset includes time series of satellite imagery from Sentinel-1, Sentinel-2, and PlanetScope satellites throughout 2016 and 2017. For each tile/chip in the dataset, there are time series of imagery from each of the satellites, as well as a corresponding label that defines the crop type at each pixel. The label has only one value at each pixel location, and assumes that the crop type remains the same across the full time span of the satellite image time series. In many cases where ground truth was not available, pixels have no label and are set to a value of 0.
Z
tBiodiv: Semantic Table Annotations Benchmark for Biodiversity Domain
data.niaid.nih.gov
Updated Apr 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
König-Ries, Birgitta (2024). tBiodiv: Semantic Table Annotations Benchmark for Biodiversity Domain [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10283014
Explore at:
Dataset updated
Apr 19, 2024
Dataset provided by
Hassanzadeh, Oktie
Jimènez-Ruiz, Ernesto
Abdelmageed, Nora
König-Ries, Birgitta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
tBiodiv is a dataset for tabular data to knowledge graph matching. It is derived for the Biodiversity domain and has two types of tables. On the one hand, Horizontal Relational Tables are where each table represents a collection of entities. On the other hand, Entity Tables represent a single entity. We supported ground truth data from Wikidata as a target knowledge graph (KG).

tBiodiv is generated by KG2Tables using two levels of a recursive hierarchy of related concepts in Wikidata.

We updated this repository with full verion of the dataset, we will update it again with the test ground truth (gt) data in the future.

The supported tasks for semantic table annotations are:

Topic Detection (TD) links the entire table to an entity or a class from the target KG.

Cell Entity Annotation (CEA) maps individual table cells to entities from the target KG.

Column Type Annotation (CTA) links individual table columns to classes from the target KG.

Column Property Annotation (CPA) detects the relations between column pairs from the target knowledge graph.

Row Annotation (RA) annotates the entire row to a KG entity or property.
D
Replication data for: Prefix variation in путать: в-. за-, пере- and с-
dataverse.azure.uit.no
dataone.org
bin +1
Updated Sep 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Nordrum; Maria Nordrum (2023). Replication data for: Prefix variation in путать: в-. за-, пере- and с- [Dataset]. http://doi.org/10.18710/0JC95M
Explore at:
text/plain; charset=us-ascii(10169), text/plain; charset=us-ascii(411), bin(110637)Available download formats
Unique identifier
https://doi.org/10.18710/0JC95M
Dataset updated
Sep 29, 2023
Dataset provided by
DataverseNO
Authors
Maria Nordrum; Maria Nordrum
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Russia, Norway
Description
This case study of the four Natural Perfectives of the Russian simplex verb путать ‘tangle’ sheds light on the following questions: Is it possible to predict the choice of prefix when there is prefix variation in Russian? And if yes, how? Since these questions are particularly relevant for second-language learners, the author also discusses how the present study and similar ones, can be used to make second language learning of Russian more effective. The analysis is based on a database of 630 sentences from the Russian National Corpus (RNC) and takes two factors into consideration: type of construction and semantic category of the internal argument. The uploaded data contain 3 files: "Database, everything": Each sentence is tagged according to prefix, form of the verb (Active vs Passive), type of construction and semantic category of the internal argument. The four types of constructions and four types of semantic categories are explained with examples from the database inside the article. "Database_simplified": This version of the database contains the three parameters for the sentences: prefix, type of construction and semantic category of the internal argument. The simplified database was created to do statistical analyses in R. "R_putat": The R script that was used in order to produce the cTree which is presented in the article.
Data from: Code4ML: a Large-scale Dataset of annotated Machine Learning Code...
zenodo.org
csv
Updated Sep 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous authors; Anonymous authors (2023). Code4ML: a Large-scale Dataset of annotated Machine Learning Code [Dataset]. http://doi.org/10.5281/zenodo.6607065
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6607065
Dataset updated
Sep 15, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous authors; Anonymous authors
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python code snippets, competition summaries, and data descriptions from Kaggle.

The data is organized in a table structure. Code4ML includes several main objects: competitions information, raw code blocks collected form Kaggle and manually marked up snippets. Each table has a .csv format.

Each competition has the text description and metadata, reflecting competition and used dataset characteristics as well as evaluation metrics (competitions.csv). The corresponding datasets can be loaded using Kaggle API and data sources.

The code blocks themselves and their metadata are collected to the data frames concerning the publishing year of the initial kernels. The current version of the corpus includes two code blocks files: snippets from kernels up to the 2020 year (сode_blocks_upto_20.csv) and those from the 2021 year (сode_blocks_21.csv) with corresponding metadata. The corpus consists of 2 743 615 ML code blocks collected from 107 524 Jupyter notebooks.

Marked up code blocks have the following metadata: anonymized id, the format of the used data (for example, table or audio), the id of the semantic type, a flag for the code errors, the estimated relevance to the semantic class (from 1 to 5), the id of the parent notebook, and the name of the competition. The current version of the corpus has ~12 000 labeled snippets (markup_data_20220415.csv).

As marked up code blocks data contains the numeric id of the code block semantic type, we also provide a mapping from this number to semantic type and subclass (actual_graph_2022-06-01.csv).

The dataset can help solve various problems, including code synthesis from a prompt in natural language, code autocompletion, and semantic code classification.
epoch data after pre-processing
figshare.com
hdf
Updated Nov 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yali Pan (2022). epoch data after pre-processing [Dataset]. http://doi.org/10.6084/m9.figshare.21206990.v4
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21206990.v4
Dataset updated
Nov 24, 2022
Dataset provided by
figshare
Authors
Yali Pan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The epoch data are a fieldtrip format structure, all epochs (ended with 'WrdOn') are aligned with fixation onset to a given word (as time 0), with a length of one second ([-0.5 0.5]s). Epochs were named as the combination of the acquisition date, subject code, and data type. Epochs that ended with 'BL_Cross' were the baseline period before the presentation of the sentences. This filed 'trialinfo' is the information about each trial, the header of all columns is as followings: 1- sentence_id: sentence number for this epoch 2- word_loc: the location of the current word in a sentence 3- loc2targ:location distance between the current word and target word; loc2targ for pre-target, target, post-target are -1, 0, and 1 4- saccade2this_duration: saccade duration toward this word 5- fixation_on_MEG: MEG trigger for fixation onset to this word 6- fixation_duration 7- NextOrder: next word location minus the current word location; negative value indicates saccade backward to the previous words 8- FirstPassFix: whether this fixation is the first for this word or not 9- PreviousOrder: previous word location minus the current word location; negative value indicates saccade forward to the next words 10- SentenceCondition: the current word is in a sentence with incongruent or congruent target word; 11 -- incongruent, 2 -- congruent 11- PupilSize: averaged pupil size during this fixation
f
Mapping from the data types to the oligonucleotides.
figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heng Sun; Jian Weng; Guangchuang Yu; Richard H. Massawe (2023). Mapping from the data types to the oligonucleotides. [Dataset]. http://doi.org/10.1371/journal.pone.0077090.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0077090.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Heng Sun; Jian Weng; Guangchuang Yu; Richard H. Massawe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mapping from the data types to the oligonucleotides.
U
Grammar transformations of topographic feature type annotations of the U.S....
data.usgs.gov
datasets.ai
+1more
Updated Jul 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emily Abbott (2024). Grammar transformations of topographic feature type annotations of the U.S. to structured graph data. [Dataset]. http://doi.org/10.5066/P1BDPXKZ
Explore at:
Unique identifier
https://doi.org/10.5066/P1BDPXKZ
Dataset updated
Jul 11, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Emily Abbott
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
1994 - 1999
Area covered
United States
Description
These data were used to examine grammatical structures and patterns within a set of geospatial glossary definitions. Objectives of our study were to analyze the semantic structure of input definitions, use this information to build triple structures of RDF graph data, upload our lexicon to a knowledge graph software, and perform SPARQL queries on the data. Upon completion of this study, SPARQL queries were proven to effectively convey graph triples which displayed semantic significance. These data represent and characterize the lexicon of our input text which are used to form graph triples. These data were collected in 2024 by passing text through multiple Python programs utilizing spaCy (a natural language processing library) and its pre-trained English transformer pipeline. Before data was processed by the Python programs, input definitions were first rewritten as natural language and formatted as tabular data. Passages were then tokenized and characterized by their part-of-spee ...
Data from: Entity Typing Datasets
zenodo.org
data.niaid.nih.gov
zip
Updated Mar 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Russa Biswas; Russa Biswas (2023). Entity Typing Datasets [Dataset]. http://doi.org/10.5281/zenodo.7688590
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7688590
Dataset updated
Mar 2, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Russa Biswas; Russa Biswas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are the datasets used in the Entity Type Prediction task for Knowledge Graph Completion.

DB630k_Fine-grained_Hierarchical.zip dataset has been used in the papers [1] and [2]. It is an extended version of DBpedia630k dataset originally created for Text classification and is available here.

FIGER.zip dataset has also been used in the papers [1] and [2].

MultilingualETdata.zip dataset has been used in the paper [3]

NamesETdata.zip dataset has been used in the paper [4]. The CaLiGraph test dataset can also be downloaded here.

[1] Biswas R, Sofronova R, Sack H, Alam M. Cat2type: Wikipedia category embeddings for entity typing in knowledge graphs. InProceedings of the 11th on Knowledge Capture Conference 2021 Dec 2 (pp. 81-88).

[2] Biswas R, Portisch J, Paulheim H, Sack H, Alam M. Entity type prediction leveraging graph walks and entity descriptions. In The Semantic Web–ISWC 2022: 21st International Semantic Web Conference, Virtual Event, October 23–27, 2022, Proceedings 2022 Oct 16 (pp. 392-410). Cham: Springer International Publishing.

[3] Biswas R, Chen Y, Paulheim H, Sack H, Alam M. It’s All in the Name: Entity Typing Using Multilingual Language Models. In The Semantic Web: ESWC 2022 Satellite Events: Hersonissos, Crete, Greece, May 29–June 2, 2022, Proceedings 2022 Jul 20 (pp. 36-41). Cham: Springer International Publishing.

[4] Biswas R, Sofronova R, Alam M, Heist N, Paulheim H, Sack H. Do judge an entity by its name! entity typing using language models. In The Semantic Web: ESWC 2021 Satellite Events: Virtual Event, June 6–10, 2021, Revised Selected Papers 18 2021 (pp. 65-70). Springer International Publishing.
K
Knowledge Graph Technology Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Knowledge Graph Technology Report [Dataset]. https://www.marketreportanalytics.com/reports/knowledge-graph-technology-53638
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Apr 2, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Variables measured
Market Size
Description
The Knowledge Graph Technology market is experiencing robust growth, driven by the increasing need for enhanced data interoperability, improved data analysis capabilities, and the rising adoption of artificial intelligence (AI) and machine learning (ML) across various industries. The market's expansion is fueled by the advantages of knowledge graphs in improving decision-making processes, streamlining operations, and fostering innovation. Specific applications, such as semantic search, personalized recommendations, and fraud detection, are witnessing significant traction. While precise market size figures are unavailable, a conservative estimate places the 2025 market value at $5 billion, with a Compound Annual Growth Rate (CAGR) of 25% projected through 2033. This growth trajectory is supported by the escalating demand for efficient data management solutions in sectors like healthcare, finance, and retail, where knowledge graphs can significantly enhance operational efficiency and strategic decision-making. Technological advancements, particularly in graph database technologies and semantic web technologies, further bolster market expansion. However, the market faces challenges such as the complexity of knowledge graph implementation, the need for specialized expertise, and data integration issues across disparate sources. Despite these challenges, the long-term outlook for knowledge graph technology remains positive, driven by continuous technological innovations and the growing recognition of its transformative potential across diverse sectors. The segmentation of the Knowledge Graph Technology market reveals significant opportunities within various application areas and technology types. Application-wise, semantic search and recommendation engines are currently leading the market, while emerging applications in areas such as risk management and supply chain optimization are poised for rapid growth in the coming years. In terms of technology types, ontology engineering and graph databases are experiencing high demand. Regionally, North America and Europe currently dominate the market due to early adoption and established technological infrastructure. However, the Asia-Pacific region is projected to witness significant growth, spurred by increasing digitalization and investments in AI and ML initiatives. Competitive landscape analysis reveals a mix of established technology providers and emerging startups, creating a dynamic and competitive ecosystem. The continuous evolution of technologies and the expansion into new applications will continue to shape the market's growth and trajectory over the forecast period.
f
Summary of the best performing measures for different applications.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaston K. Mazandu; Nicola J. Mulder (2023). Summary of the best performing measures for different applications. [Dataset]. http://doi.org/10.1371/journal.pone.0113859.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0113859.t005
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Gaston K. Mazandu; Nicola J. Mulder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
List of the best performing functional similarity measures, term specificity and semantic similarity approaches for different biological data, including Enzyme Commission (EC), Pfam domain, Sequence Similarity (Seq. Sim.), Protein-Protein Interaction (PPI) and Co-expression Network (CN) or Gene Expression (microarray) data.Summary of the best performing measures for different applications.
NPClassifier: A Deep Neural Network-Based Structural Classification Tool for...
figshare.com
xlsx
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hyun Woo Kim; Mingxun Wang; Christopher A. Leber; Louis-Félix Nothias; Raphael Reher; Kyo Bin Kang; Justin J. J. van der Hooft; Pieter C. Dorrestein; William H. Gerwick; Garrison W. Cottrell (2023). NPClassifier: A Deep Neural Network-Based Structural Classification Tool for Natural Products [Dataset]. http://doi.org/10.1021/acs.jnatprod.1c00399.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jnatprod.1c00399.s002
Dataset updated
Jun 9, 2023
Dataset provided by
ACS Publications
Authors
Hyun Woo Kim; Mingxun Wang; Christopher A. Leber; Louis-Félix Nothias; Raphael Reher; Kyo Bin Kang; Justin J. J. van der Hooft; Pieter C. Dorrestein; William H. Gerwick; Garrison W. Cottrell
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Computational approaches such as genome and metabolome mining are becoming essential to natural products (NPs) research. Consequently, a need exists for an automated structure-type classification system to handle the massive amounts of data appearing for NP structures. An ideal semantic ontology for the classification of NPs should go beyond the simple presence/absence of chemical substructures, but also include the taxonomy of the producing organism, the nature of the biosynthetic pathway, and/or their biological properties. Thus, a holistic and automatic NP classification framework could have considerable value to comprehensively navigate the relatedness of NPs, and especially so when analyzing large numbers of NPs. Here, we introduce NPClassifier, a deep-learning tool for the automated structural classification of NPs from their counted Morgan fingerprints. NPClassifier is expected to accelerate and enhance NP discovery by linking NP structures to their underlying properties.
m
Data from: Orthographic-semantic consistency effects in lexical decision:...
data.mendeley.com
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yasushi Hino (2025). Orthographic-semantic consistency effects in lexical decision: What types of neighbors are responsible for the effects? [Dataset]. http://doi.org/10.17632/m3hryjj7h5.5
Explore at:
Unique identifier
https://doi.org/10.17632/m3hryjj7h5.5
Dataset updated
Apr 14, 2025
Authors
Yasushi Hino
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and analysis codes to reproduce the results reported in "Orthographic-Semantic Consistency Effects in Lexical Decision: What types of Neighbors are Responsible for the Effects?" authored by Yasushi Hino, Debra Jared and Steve Lupker. In Data Analyses 1,2 and 3, lexical decision latency and accuracy data from English Lexicon Project (Balota, Yap, Cortese, Hutchison, Kessler, Loftus, Neely, Nelson, Simpson & Treiman, 2007) are used and analyzed to examine whether orthographic-semantic consistency effect is observed on lexical decision data. In Experiment, on the other hand, behavioral data are collected in online lexical decision experiment to examine whether the orthographic-semantic consistency effect is observed when the consistency is manipulated based on either addition neighbors or substitution neighbors. In addition, we also provided analysis codes to reproduce the results for Table 16 in the paper as well as the results of Tables and Figures reported in Supplementary Materials.
Resources of IncRML: Incremental Knowledge Graph Construction from...
zenodo.org
bin, text/x-python +1
Updated Mar 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dylan Van Assche; Dylan Van Assche; Julian Andres Rojas Melendez; Julian Andres Rojas Melendez; Ben De Meester; Ben De Meester; Pieter Colpaert; Pieter Colpaert (2024). Resources of IncRML: Incremental Knowledge Graph Construction from Heterogeneous Data Sources [Dataset]. http://doi.org/10.5281/zenodo.10171157
Explore at:
xz, text/x-python, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10171157
Dataset updated
Mar 18, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dylan Van Assche; Dylan Van Assche; Julian Andres Rojas Melendez; Julian Andres Rojas Melendez; Ben De Meester; Ben De Meester; Pieter Colpaert; Pieter Colpaert
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 8, 2023
Description
IncRML resources

This Zenodo dataset contains all the resources of the paper 'IncRML: Incremental Knowledge Graph Construction from Heterogeneous Data Sources' submitted to the Semantic Web Journal's Special Issue on Knowledge Graph Construction. This resource aims to make the paper experiments fully reproducible through our experiment tool written in Python which was already used before in the Knowledge Graph Construction Challenge by the ESWC 2023 Workshop on Knowledge Graph Construction. The exact Java JAR file of the RMLMapper (rmlmapper.jar) is also provided in this dataset which was used to execute the experiments. This JAR file was executed with Java OpenJDK 11.0.20.1 on Ubuntu 22.04.1 LTS (Linux 5.15.0-53-generic). Each experiment was executed 5 times and the median values are reported together with the standard deviation of the measurements.

Datasets

We provide both dataset dumps of the GTFS-Madrid-Benchmark and of real-life use cases from Open Data in Belgium.
GTFS-Madrid-Benchmark dumps are used to analyze the impact on execution time and resources, while the real-life use cases aim to verify the approach on different types of datasets since the GTFS-Madrid-Benchmark is a single type of dataset which does not advertise changes at all.

Benchmarks

GTFS-Madrid-Benchmark: change types with fixed data size and amount of changes: additions-only, modifications-only, deletions-only (11 versions)

GTFS-Madrid-Benchmark: amount of changes with fixed data size: 0%, 25%, 50%, 75%, and 100% changes (11 versions)

GTFS-Madrid-Benchmark: data size with fixed amount of changes: scales 1, 10, 100 (11 versions)

Real-life use cases

Traffic control center Vlaams Verkeerscentrum (Belgium): traffic board messages data (1 day, 28760 versions)

Meteorological institute KMI (Belgium): weather sensor data (1 day, 144 versions)

Public transport agency NMBS (Belgium): train schedule data (1 week, 7 versions)

Public transport agency De Lijn (Belgium): busses schedule data (1 week, 7 versions)

Bike-sharing company BlueBike (Belgium): bike-sharing availability data (1 day, 1440 versions)

Bike-sharing company JCDecaux (EU): bike-sharing availability data (1 day, 1440 versions)

OpenStreetMap (World): geographical map data (1 day, 1440 versions)

Remarks

The first version of each dataset is always used as a baseline. All next versions are applied as an update on the existing version. The reported results are only focusing on the updates since these are the actual incremental generation.

GTFS-Change-50_percent-{ALL, CHANGE}.tar.xz datasets are not uploaded as GTFS-Madrid-Benchmark scale 100 because both share the same parameters (50% changes, scale 100). Please use GTFS-Scale-100-{ALL, CHANGE}.tar.xz for GTFS-Change-50_percent-{ALL, CHANGE}.tar.xz

All datasets are compressed with XZ and provided as a TAR archive, be aware that you need sufficient space to decompress these archives! 2 TB of free space is advised to decompress all benchmarks and use cases. The expected output is provided as a ZIP file in each TAR archive, decompressing these requires even more space (4 TB).

Reproducing

By using our experiment tool, you can easily reproduce the experiments as followed:

Download one of the TAR.XZ archives and unpack them.

Clone the GitHub repository of our experiment tool and install the Python dependencies with 'pip install -r requirements.txt'.

Download the rmlmapper.jar JAR file from this Zenodo dataset and place it inside the experiment tool root folder.

Execute the tool by running: './exectool --root=/path/to/the/root/of/the/tarxz/archive --runs=5 run'. The argument '--runs=5' is used to perform the experiment 5 times.

Once executed, you can generate the statistics by running: './exectool --root=/path/to/the/root/of/the/tarxz/archive stats'.

Testcases

Testcases to verify the integration of RML and LDES with IncRML, see https://doi.org/10.5281/zenodo.10171394
Z
Global Healthcare Data Interoperability Market By Type (Solutions,...
zionmarketresearch.com
pdf
Updated May 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zion Market Research (2025). Global Healthcare Data Interoperability Market By Type (Solutions, Services), By Level (Foundational, Structural, and Semantic), By Deployment (Cloud-based, On-premise), By Application (Diagnosis, Treatment, Others), By Model (Centralized, Hybrid, Decentralized), By End-users (Ambulatory Surgical Centers, Hospitals): Global Industry Perspective, Comprehensive Analysis and Forecast, 2020 - 2026 [Dataset]. https://www.zionmarketresearch.com/report/healthcare-data-interoperability-market
Explore at:
pdfAvailable download formats
Dataset updated
May 18, 2025
Dataset authored and provided by
Zion Market Research
License
https://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy
Time period covered
2022 - 2030
Area covered
Global
Description
The global healthcare interoperability solutions market was valued at USD 2.5 billion in 2019, and is expected to reach USD 4.9 billion by 2026, at a CAGR of 11.2%.

Facebook

Twitter

Click to copy link

Link copied

Cite

John Snow Labs (2021). Disease or Syndrome Concepts and Types [Dataset]. https://www.johnsnowlabs.com/marketplace/disease-or-syndrome-concepts-and-types/

Disease or Syndrome Concepts and Types

Explore at:

csvAvailable download formats

Dataset updated

Jan 20, 2021

Dataset authored and provided by

John Snow Labs

Area covered

N/A

Description

This dataset contains the entire concept structure of UMLS Metathesaurus for the semantic type "Disease or Syndrome". One of the primary purposes of this dataset is to connect different names for all the concepts for a specific Semantic Type. There are 125 semantic types in the Semantic Network. Every Metathesaurus concept is assigned at least one semantic type; very few terms are assigned as many as five semantic types.

Clear search

Close search

Google apps

Main menu

Disease or Syndrome Concepts and Types

Aerial Semantic Segmentation Drone Dataset

Intelligent Semantic Data Service Report

tFood: Semantic Table Annotations Benchmark for Food Domain

tBiomed: Semantic Table Annotations Benchmark for Biomedical Domain

Semantic Knowledge Graphing Market is Growing at a CAGR of 14.80% from 2024...

Semantic Segmentation of Crop Type in Ghana

tBiodiv: Semantic Table Annotations Benchmark for Biodiversity Domain

Replication data for: Prefix variation in путать: в-. за-, пере- and с-

Data from: Code4ML: a Large-scale Dataset of annotated Machine Learning Code...

epoch data after pre-processing

Mapping from the data types to the oligonucleotides.

Grammar transformations of topographic feature type annotations of the U.S....

Data from: Entity Typing Datasets

Knowledge Graph Technology Report

Summary of the best performing measures for different applications.

NPClassifier: A Deep Neural Network-Based Structural Classification Tool for...

Data from: Orthographic-semantic consistency effects in lexical decision:...

Resources of IncRML: Incremental Knowledge Graph Construction from...

IncRML resources

Datasets

Benchmarks

Real-life use cases

Remarks

Reproducing

Testcases

Global Healthcare Data Interoperability Market By Type (Solutions,...

Disease or Syndrome Concepts and TypesSee More Versions

Disease or Syndrome Concepts and Types