This dataset contains the entire concept structure of UMLS Metathesaurus for the semantic type "Disease or Syndrome". One of the primary purposes of this dataset is to connect different names for all the concepts for a specific Semantic Type. There are 125 semantic types in the Semantic Network. Every Metathesaurus concept is assigned at least one semantic type; very few terms are assigned as many as five semantic types.
Dataset Resource: https://www.tugraz.at/index.php?id=22387
Citation If you use this dataset in your research, please cite the following URL:
http://dronedataset.icg.tugraz.at
License The Drone Dataset is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree:
That the dataset comes "AS IS", without express or implied warranty. Although every effort has been made to ensure accuracy, we (Graz University of Technology) do not accept any responsibility for errors or omissions. That you include a reference to the Semantic Drone Dataset in any work that makes use of the dataset. For research papers or other media link to the Semantic Drone Dataset webpage. That you do not distribute this dataset or modified versions. It is permissible to distribute derivative works in as far as they are abstract representations of this dataset (such as models trained on it or additional annotations that do not directly include any of our data) and do not allow to recover the dataset or something similar in character. That you may not use the dataset or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain. That all rights not expressly granted to you are reserved by us (Graz University of Technology).
Dataset Overview The Semantic Drone Dataset focuses on semantic understanding of urban scenes for increasing the safety of autonomous drone flight and landing procedures. The imagery depicts more than 20 houses from nadir (bird's eye) view acquired at an altitude of 5 to 30 meters above ground. A high resolution camera was used to acquire images at a size of 6000x4000px (24Mpx). The training set contains 400 publicly available images and the test set is made up of 200 private images.
PERSON DETECTION For the task of person detection the dataset contains bounding box annotations of the training and test set.
SEMANTIC SEGMENTATION We prepared pixel-accurate annotation for the same training and test set. The complexity of the dataset is limited to 20 classes as listed in the following table.
Table 1: Semanic classes of the Drone Dataset
tree, gras, other vegetation, dirt, gravel, rocks, water, paved area, pool, person, dog, car, bicycle, roof, wall, fence, fence-pole, window, door, obstacle
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The Intelligent Semantic Data Service (ISDS) market is experiencing robust growth, driven by the increasing need for businesses to derive actionable insights from complex and unstructured data. The market, estimated at $15 billion in 2025, is projected to expand at a Compound Annual Growth Rate (CAGR) of 20% from 2025 to 2033, reaching an estimated $70 billion by 2033. This growth is fueled by several key factors. Firstly, the rise of big data and the limitations of traditional data processing techniques are pushing organizations toward sophisticated solutions like ISDS to unlock the true potential of their data assets. Secondly, advancements in artificial intelligence (AI), natural language processing (NLP), and machine learning (ML) are enhancing the capabilities of ISDS, enabling more accurate and insightful data analysis. Thirdly, cloud-based deployments of ISDS are gaining significant traction, offering scalability, cost-effectiveness, and accessibility to a wider range of users. The enterprise segment currently dominates the market, driven by the need for improved operational efficiency, better decision-making, and enhanced customer experience. However, the personal segment is expected to witness faster growth due to increasing consumer adoption of AI-powered applications and smart devices. The competitive landscape is highly dynamic, with major technology companies like Google, IBM, Microsoft, Amazon, and Salesforce vying for market share. OpenAI, Alibaba, and Tencent are also making significant strides in the development and deployment of advanced ISDS solutions. North America currently holds the largest market share, fueled by early adoption and high technology investment. However, Asia-Pacific is expected to demonstrate the fastest growth, driven by rapid digital transformation in regions like China and India. Despite the significant opportunities, certain restraints remain. These include the high initial investment costs associated with ISDS implementation, the need for skilled professionals to manage and interpret the generated insights, and concerns related to data privacy and security. The market is further segmented by deployment type (cloud-based and on-premises) and application (enterprise and personal), reflecting the diverse needs and preferences of different user segments. Addressing these challenges will be crucial for continued market expansion and broader adoption of ISDS.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
tFood is a dataset for tabular data to knowledge graph matching. It is derived for the Food domain and has two types of tables. On the one hand, Horizontal Relational Tables are where each table represents a collection of entities. On the other hand, Entity Tables are where each of which represents a single entity. We supported ground truth data from Wikidata as a target knowledge graph (KG).
The supported tasks for semantic table annotations are:
This dataset version will be used during SemTab 2023 - Round 1. So, the ground truth data for the test set is currently hidden. We will add such ground truth after the conclusion of the challenge.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
tBiomed is a dataset for tabular data to knowledge graph matching. It is derived for the Biodiversity domain and has two types of tables. On the one hand, Horizontal Relational Tables are where each table represents a collection of entities. On the other hand, Entity Tables represent a single entity. We supported ground truth data from Wikidata as a target knowledge graph (KG).
tBiomed is generated by KG2Tables using two levels of a recursive hierarchy of related concepts in Wikidata.
tBiomed contains 26,778 entity and horizontal tables, while this repository contains only a validation fold of the original data representing 20% of the total of the entire benchmark with its ground truth data (gt). The Full size of this dataset is 1 GB.
We included the full version of the dataset. We will update this repository ground truth data of the test set in the Future.
The supported tasks for semantic table annotations are:
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
According to Cognitive Market Research, the global semantic knowledge graphing market size is USD 1512.2 million in 2024 and will expand at a compound annual growth rate (CAGR) of 14.80% from 2024 to 2031.
North America held the major market of around 40% of the global revenue with a market size of USD 604.88 million in 2024 and will grow at a compound annual growth rate (CAGR) of 13.0% from 2024 to 2031.
Europe accounted for a share of over 30% of the global market size of USD 453.66 million.
Asia Pacific held the market of around 23% of the global revenue with a market size of USD 347.81 million in 2024 and will grow at a compound annual growth rate (CAGR) of 16.8% from 2024 to 2031.
Latin America market of around 5% of the global revenue with a market size of USD 75.61 million in 2024 and will grow at a compound annual growth rate (CAGR) of 14.2% from 2024 to 2031.
Middle East and Africa held the major market of around 2% of the global revenue with a market size of USD 30.24 million in 2024 and will grow at a compound annual growth rate (CAGR) of 14.5% from 2024 to 2031.
The natural language processing knowledge graphing held the highest growth rate in semantic knowledge graphing market in 2024.
Market Dynamics of Semantic Knowledge Graphing Market
Key Drivers of Semantic Knowledge Graphing Market
Growing Volumes of Structured, Semi-structured, and Unstructured Data to Increase the Global Demand
The global demand for semantic knowledge graphing is escalating in response to the exponential growth of structured, semi-structured, and unstructured data. Enterprises are inundated with vast amounts of data from diverse sources such as social media, IoT devices, and enterprise applications. Structured data from databases, semi-structured data like XML and JSON, and unstructured data from documents, emails, and multimedia files present significant challenges in terms of organization, analysis, and deriving actionable insights. Semantic knowledge graphing addresses these challenges by providing a unified framework for representing, integrating, and analyzing disparate data types. By leveraging semantic technologies, businesses can unlock the value hidden within their data, enabling advanced analytics, natural language processing, and knowledge discovery. As organizations increasingly recognize the importance of harnessing data for strategic decision-making, the demand for semantic knowledge graphing solutions continues to surge globally.
Demand for Contextual Insights to Propel the Growth
The burgeoning demand for contextual insights is propelling the growth of semantic knowledge graphing solutions. In today's data-driven landscape, businesses are striving to extract deeper contextual meaning from their vast datasets to gain a competitive edge. Semantic knowledge graphing enables organizations to connect disparate data points, understand relationships, and derive valuable insights within the appropriate context. This contextual understanding is crucial for various applications such as personalized recommendations, predictive analytics, and targeted marketing campaigns. By leveraging semantic technologies, companies can not only enhance decision-making processes but also improve customer experiences and operational efficiency. As industries across sectors increasingly recognize the importance of contextual insights in driving innovation and business success, the adoption of semantic knowledge graphing solutions is poised to witness significant growth. This trend underscores the pivotal role of semantic technologies in unlocking the true potential of data for strategic advantage in today's dynamic marketplace.
Restraint Factors Of Semantic Knowledge Graphing Market
Stringent Data Privacy Regulations to Hinder the Market Growth
Stringent data privacy regulations present a significant hurdle to the growth of the Semantic Knowledge Graphing market. Regulations such as GDPR (General Data Protection Regulation) in Europe and CCPA (California Consumer Privacy Act) in the United States impose strict requirements on how organizations collect, store, process, and share personal data. Compliance with these regulations necessitates robust data protection measures, including anonymization, encryption, and access controls, which can complicate the implementation of semantic knowledge graphing systems. Moreover, concerns about data breach...
Automatic, accurate crop type maps can provide unprecedented information for understanding food systems, especially in developing countries where ground surveys are infrequent. However, little work has applied existing methods to these data scarce environments, which also have unique challenges of irregularly shaped fields, frequent cloud coverage, small plots, and a severe lack of training data. To address this gap in the literature, we provide the first crop type semantic segmentation dataset of small holder farms, specifically in Ghana and South Sudan. We are also the first to utilize high resolution, high frequency satellite data in segmenting small holder farms.
The dataset includes time series of satellite imagery from Sentinel-1, Sentinel-2, and PlanetScope satellites throughout 2016 and 2017. For each tile/chip in the dataset, there are time series of imagery from each of the satellites, as well as a corresponding label that defines the crop type at each pixel. The label has only one value at each pixel location, and assumes that the crop type remains the same across the full time span of the satellite image time series. In many cases where ground truth was not available, pixels have no label and are set to a value of 0.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
tBiodiv is a dataset for tabular data to knowledge graph matching. It is derived for the Biodiversity domain and has two types of tables. On the one hand, Horizontal Relational Tables are where each table represents a collection of entities. On the other hand, Entity Tables represent a single entity. We supported ground truth data from Wikidata as a target knowledge graph (KG).
tBiodiv is generated by KG2Tables using two levels of a recursive hierarchy of related concepts in Wikidata.
We updated this repository with full verion of the dataset, we will update it again with the test ground truth (gt) data in the future.
The supported tasks for semantic table annotations are:
Topic Detection (TD) links the entire table to an entity or a class from the target KG.
Cell Entity Annotation (CEA) maps individual table cells to entities from the target KG.
Column Type Annotation (CTA) links individual table columns to classes from the target KG.
Column Property Annotation (CPA) detects the relations between column pairs from the target knowledge graph.
Row Annotation (RA) annotates the entire row to a KG entity or property.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This case study of the four Natural Perfectives of the Russian simplex verb путать ‘tangle’ sheds light on the following questions: Is it possible to predict the choice of prefix when there is prefix variation in Russian? And if yes, how? Since these questions are particularly relevant for second-language learners, the author also discusses how the present study and similar ones, can be used to make second language learning of Russian more effective. The analysis is based on a database of 630 sentences from the Russian National Corpus (RNC) and takes two factors into consideration: type of construction and semantic category of the internal argument. The uploaded data contain 3 files: "Database, everything": Each sentence is tagged according to prefix, form of the verb (Active vs Passive), type of construction and semantic category of the internal argument. The four types of constructions and four types of semantic categories are explained with examples from the database inside the article. "Database_simplified": This version of the database contains the three parameters for the sentences: prefix, type of construction and semantic category of the internal argument. The simplified database was created to do statistical analyses in R. "R_putat": The R script that was used in order to produce the cTree which is presented in the article.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python code snippets, competition summaries, and data descriptions from Kaggle.
The data is organized in a table structure. Code4ML includes several main objects: competitions information, raw code blocks collected form Kaggle and manually marked up snippets. Each table has a .csv format.
Each competition has the text description and metadata, reflecting competition and used dataset characteristics as well as evaluation metrics (competitions.csv). The corresponding datasets can be loaded using Kaggle API and data sources.
The code blocks themselves and their metadata are collected to the data frames concerning the publishing year of the initial kernels. The current version of the corpus includes two code blocks files: snippets from kernels up to the 2020 year (сode_blocks_upto_20.csv) and those from the 2021 year (сode_blocks_21.csv) with corresponding metadata. The corpus consists of 2 743 615 ML code blocks collected from 107 524 Jupyter notebooks.
Marked up code blocks have the following metadata: anonymized id, the format of the used data (for example, table or audio), the id of the semantic type, a flag for the code errors, the estimated relevance to the semantic class (from 1 to 5), the id of the parent notebook, and the name of the competition. The current version of the corpus has ~12 000 labeled snippets (markup_data_20220415.csv).
As marked up code blocks data contains the numeric id of the code block semantic type, we also provide a mapping from this number to semantic type and subclass (actual_graph_2022-06-01.csv).
The dataset can help solve various problems, including code synthesis from a prompt in natural language, code autocompletion, and semantic code classification.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The epoch data are a fieldtrip format structure, all epochs (ended with 'WrdOn') are aligned with fixation onset to a given word (as time 0), with a length of one second ([-0.5 0.5]s). Epochs were named as the combination of the acquisition date, subject code, and data type. Epochs that ended with 'BL_Cross' were the baseline period before the presentation of the sentences. This filed 'trialinfo' is the information about each trial, the header of all columns is as followings: 1- sentence_id: sentence number for this epoch 2- word_loc: the location of the current word in a sentence 3- loc2targ:location distance between the current word and target word; loc2targ for pre-target, target, post-target are -1, 0, and 1 4- saccade2this_duration: saccade duration toward this word 5- fixation_on_MEG: MEG trigger for fixation onset to this word 6- fixation_duration 7- NextOrder: next word location minus the current word location; negative value indicates saccade backward to the previous words 8- FirstPassFix: whether this fixation is the first for this word or not 9- PreviousOrder: previous word location minus the current word location; negative value indicates saccade forward to the next words 10- SentenceCondition: the current word is in a sentence with incongruent or congruent target word; 11 -- incongruent, 2 -- congruent 11- PupilSize: averaged pupil size during this fixation
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mapping from the data types to the oligonucleotides.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
These data were used to examine grammatical structures and patterns within a set of geospatial glossary definitions. Objectives of our study were to analyze the semantic structure of input definitions, use this information to build triple structures of RDF graph data, upload our lexicon to a knowledge graph software, and perform SPARQL queries on the data. Upon completion of this study, SPARQL queries were proven to effectively convey graph triples which displayed semantic significance. These data represent and characterize the lexicon of our input text which are used to form graph triples. These data were collected in 2024 by passing text through multiple Python programs utilizing spaCy (a natural language processing library) and its pre-trained English transformer pipeline. Before data was processed by the Python programs, input definitions were first rewritten as natural language and formatted as tabular data. Passages were then tokenized and characterized by their part-of-spee ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are the datasets used in the Entity Type Prediction task for Knowledge Graph Completion.
[1] Biswas R, Sofronova R, Sack H, Alam M. Cat2type: Wikipedia category embeddings for entity typing in knowledge graphs. InProceedings of the 11th on Knowledge Capture Conference 2021 Dec 2 (pp. 81-88).
[2] Biswas R, Portisch J, Paulheim H, Sack H, Alam M. Entity type prediction leveraging graph walks and entity descriptions. In The Semantic Web–ISWC 2022: 21st International Semantic Web Conference, Virtual Event, October 23–27, 2022, Proceedings 2022 Oct 16 (pp. 392-410). Cham: Springer International Publishing.
[3] Biswas R, Chen Y, Paulheim H, Sack H, Alam M. It’s All in the Name: Entity Typing Using Multilingual Language Models. In The Semantic Web: ESWC 2022 Satellite Events: Hersonissos, Crete, Greece, May 29–June 2, 2022, Proceedings 2022 Jul 20 (pp. 36-41). Cham: Springer International Publishing.
[4] Biswas R, Sofronova R, Alam M, Heist N, Paulheim H, Sack H. Do judge an entity by its name! entity typing using language models. In The Semantic Web: ESWC 2021 Satellite Events: Virtual Event, June 6–10, 2021, Revised Selected Papers 18 2021 (pp. 65-70). Springer International Publishing.
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The Knowledge Graph Technology market is experiencing robust growth, driven by the increasing need for enhanced data interoperability, improved data analysis capabilities, and the rising adoption of artificial intelligence (AI) and machine learning (ML) across various industries. The market's expansion is fueled by the advantages of knowledge graphs in improving decision-making processes, streamlining operations, and fostering innovation. Specific applications, such as semantic search, personalized recommendations, and fraud detection, are witnessing significant traction. While precise market size figures are unavailable, a conservative estimate places the 2025 market value at $5 billion, with a Compound Annual Growth Rate (CAGR) of 25% projected through 2033. This growth trajectory is supported by the escalating demand for efficient data management solutions in sectors like healthcare, finance, and retail, where knowledge graphs can significantly enhance operational efficiency and strategic decision-making. Technological advancements, particularly in graph database technologies and semantic web technologies, further bolster market expansion. However, the market faces challenges such as the complexity of knowledge graph implementation, the need for specialized expertise, and data integration issues across disparate sources. Despite these challenges, the long-term outlook for knowledge graph technology remains positive, driven by continuous technological innovations and the growing recognition of its transformative potential across diverse sectors. The segmentation of the Knowledge Graph Technology market reveals significant opportunities within various application areas and technology types. Application-wise, semantic search and recommendation engines are currently leading the market, while emerging applications in areas such as risk management and supply chain optimization are poised for rapid growth in the coming years. In terms of technology types, ontology engineering and graph databases are experiencing high demand. Regionally, North America and Europe currently dominate the market due to early adoption and established technological infrastructure. However, the Asia-Pacific region is projected to witness significant growth, spurred by increasing digitalization and investments in AI and ML initiatives. Competitive landscape analysis reveals a mix of established technology providers and emerging startups, creating a dynamic and competitive ecosystem. The continuous evolution of technologies and the expansion into new applications will continue to shape the market's growth and trajectory over the forecast period.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of the best performing functional similarity measures, term specificity and semantic similarity approaches for different biological data, including Enzyme Commission (EC), Pfam domain, Sequence Similarity (Seq. Sim.), Protein-Protein Interaction (PPI) and Co-expression Network (CN) or Gene Expression (microarray) data.Summary of the best performing measures for different applications.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Computational approaches such as genome and metabolome mining are becoming essential to natural products (NPs) research. Consequently, a need exists for an automated structure-type classification system to handle the massive amounts of data appearing for NP structures. An ideal semantic ontology for the classification of NPs should go beyond the simple presence/absence of chemical substructures, but also include the taxonomy of the producing organism, the nature of the biosynthetic pathway, and/or their biological properties. Thus, a holistic and automatic NP classification framework could have considerable value to comprehensively navigate the relatedness of NPs, and especially so when analyzing large numbers of NPs. Here, we introduce NPClassifier, a deep-learning tool for the automated structural classification of NPs from their counted Morgan fingerprints. NPClassifier is expected to accelerate and enhance NP discovery by linking NP structures to their underlying properties.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and analysis codes to reproduce the results reported in "Orthographic-Semantic Consistency Effects in Lexical Decision: What types of Neighbors are Responsible for the Effects?" authored by Yasushi Hino, Debra Jared and Steve Lupker. In Data Analyses 1,2 and 3, lexical decision latency and accuracy data from English Lexicon Project (Balota, Yap, Cortese, Hutchison, Kessler, Loftus, Neely, Nelson, Simpson & Treiman, 2007) are used and analyzed to examine whether orthographic-semantic consistency effect is observed on lexical decision data. In Experiment, on the other hand, behavioral data are collected in online lexical decision experiment to examine whether the orthographic-semantic consistency effect is observed when the consistency is manipulated based on either addition neighbors or substitution neighbors. In addition, we also provided analysis codes to reproduce the results for Table 16 in the paper as well as the results of Tables and Figures reported in Supplementary Materials.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Zenodo dataset contains all the resources of the paper 'IncRML: Incremental Knowledge Graph Construction from Heterogeneous Data Sources' submitted to the Semantic Web Journal's Special Issue on Knowledge Graph Construction. This resource aims to make the paper experiments fully reproducible through our experiment tool written in Python which was already used before in the Knowledge Graph Construction Challenge by the ESWC 2023 Workshop on Knowledge Graph Construction. The exact Java JAR file of the RMLMapper (rmlmapper.jar) is also provided in this dataset which was used to execute the experiments. This JAR file was executed with Java OpenJDK 11.0.20.1 on Ubuntu 22.04.1 LTS (Linux 5.15.0-53-generic). Each experiment was executed 5 times and the median values are reported together with the standard deviation of the measurements.
We provide both dataset dumps of the GTFS-Madrid-Benchmark and of real-life use cases from Open Data in Belgium.
GTFS-Madrid-Benchmark dumps are used to analyze the impact on execution time and resources, while the real-life use cases aim to verify the approach on different types of datasets since the GTFS-Madrid-Benchmark is a single type of dataset which does not advertise changes at all.
By using our experiment tool, you can easily reproduce the experiments as followed:
Testcases to verify the integration of RML and LDES with IncRML, see https://doi.org/10.5281/zenodo.10171394
https://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy
The global healthcare interoperability solutions market was valued at USD 2.5 billion in 2019, and is expected to reach USD 4.9 billion by 2026, at a CAGR of 11.2%.
This dataset contains the entire concept structure of UMLS Metathesaurus for the semantic type "Disease or Syndrome". One of the primary purposes of this dataset is to connect different names for all the concepts for a specific Semantic Type. There are 125 semantic types in the Semantic Network. Every Metathesaurus concept is assigned at least one semantic type; very few terms are assigned as many as five semantic types.