100+ datasets found

Z
Pairwise Multi-Class Document Classification for Semantic Relations between...
data.niaid.nih.gov
Updated Aug 1, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Terry Ruas (2020). Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles (Dataset, Models & Code) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3713182
Explore at:
Dataset updated
Aug 1, 2020
Dataset provided by
Moritz Schubotz
Bela Gipp
Malte Ostendorff
Georg Rehm
Terry Ruas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Many digital libraries recommend literature to their users considering the similarity between a query document and their repository. However, they often fail to distinguish what is the relationship that makes two documents alike. In this paper, we model the problem of finding the relationship between two documents as a pairwise document classification task. To find the semantic relation between documents, we apply a series of techniques, such as GloVe, Paragraph-Vectors, BERT, and XLNet under different configurations (e.g., sequence length, vector concatenation scheme), including a Siamese architecture for the Transformer-based systems. We perform our experiments on a newly proposed dataset of 32,168 Wikipedia article pairs and Wikidata properties that define the semantic document relations. Our results show vanilla BERT as the best performing system with an F1-score of 0.93, which we manually examine to better understand its applicability to other domains. Our findings suggest that classifying semantic relations between documents is a solvable task and motivates the development of recommender systems based on the evaluated techniques. The discussions in this paper serve as first steps in the exploration of documents through SPARQL-like queries such that one could find documents that are similar in one aspect but dissimilar in another.

Additional information can be found on GitHub.

The following data is supplemental to the experiments described in our research paper. The data consists of:

Datasets (articles, class labels, cross-validation splits)

Pretrained models (Transformers, GloVe, Doc2vec)

Model output (prediction) for the best performing models

Dataset

The Wikipedia article corpus is available in enwiki-20191101-pages-articles.weighted.10k.jsonl.bz2. The original data have been downloaded as XML dump, and the corresponding articles were extracted as plain-text with gensim.scripts.segment_wiki. The archive contains only articles that are available in training or test data.

The actual dataset is provided as used in the stratified k-fold with k=4 in train_testdata_4folds.tar.gz.

├── 1 │ ├── test.csv │ └── train.csv ├── 2 │ ├── test.csv │ └── train.csv ├── 3 │ ├── test.csv │ └── train.csv └── 4 ├── test.csv └── train.csv

4 directories, 8 files

Pretrained models

PyTorch: vanilla and Siamese BERT + XLNet

Pretrained model for each fold is available in the corresponding model archives:

Vanilla

model_wiki.bert_base_joint_seq512.tar.gz model_wiki.xlnet_base_joint_seq512.tar.gz

Siamese

model_wiki.bert_base_siamese_seq512_4d.tar.gz model_wiki.xlnet_base_siamese_seq512_4d.tar.gz
R
Fyp Semantic Dataset
universe.roboflow.com
zip
Updated Mar 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fyp (2025). Fyp Semantic Dataset [Dataset]. https://universe.roboflow.com/fyp-efein/fyp-semantic/model/7
Explore at:
zipAvailable download formats
Dataset updated
Mar 4, 2025
Dataset authored and provided by
fyp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Greenery VFcH Masks
Description
Fyp Semantic

## Overview Fyp Semantic is a dataset for semantic segmentation tasks - it contains Greenery VFcH annotations for 1,000 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
R
Dataset Corrosion Seg Semantic Dataset
universe.roboflow.com
zip
Updated Oct 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
computervision (2024). Dataset Corrosion Seg Semantic Dataset [Dataset]. https://universe.roboflow.com/computervision-laxn2/dataset-corrosion-seg-semantic/model/3
Explore at:
zipAvailable download formats
Dataset updated
Oct 8, 2024
Dataset authored and provided by
computervision
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Corrosion Masks
Description
Dataset Corrosion Seg Semantic

## Overview Dataset Corrosion Seg Semantic is a dataset for semantic segmentation tasks - it contains Corrosion annotations for 978 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
h
sick
huggingface.co
Updated Sep 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roberto Zamparelli (2023). sick [Dataset]. https://huggingface.co/datasets/RobZamp/sick
Explore at:
Dataset updated
Sep 1, 2023
Authors
Roberto Zamparelli
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
Shared and internationally recognized benchmarks are fundamental for the development of any computational system. We aim to help the research community working on compositional distributional semantic models (CDSMs) by providing SICK (Sentences Involving Compositional Knowldedge), a large size English benchmark tailored for them. SICK consists of about 10,000 English sentence pairs that include many examples of the lexical, syntactic and semantic phenomena that CDSMs are expected to account for, but do not require dealing with other aspects of existing sentential data sets (idiomatic multiword expressions, named entities, telegraphic language) that are not within the scope of CDSMs. By means of crowdsourcing techniques, each pair was annotated for two crucial semantic tasks: relatedness in meaning (with a 5-point rating scale as gold score) and entailment relation between the two elements (with three possible gold labels: entailment, contradiction, and neutral). The SICK data set was used in SemEval-2014 Task 1, and it freely available for research purposes.
z
Biotea dataset (vr. July 2012)
zenodo.org
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leyla Jael Garcia Castro; Olga Giraldo; Casey McLaughlin; Alexander Garcia; Leyla Jael Garcia Castro; Olga Giraldo; Casey McLaughlin; Alexander Garcia (2020). Biotea dataset (vr. July 2012) [Dataset]. http://doi.org/10.5281/zenodo.376814
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.376814
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodo
Authors
Leyla Jael Garcia Castro; Olga Giraldo; Casey McLaughlin; Alexander Garcia; Leyla Jael Garcia Castro; Olga Giraldo; Casey McLaughlin; Alexander Garcia
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Background

Information reported by scientific literature still remains locked up in discrete documents that are not always interconnected or machine-readable. The Semantic Web together with approaches such as the Resource Description Framework (RDF) and the Linked Open Data (LOD) initiative offer a connectivity tissue that can be used to support the generation of self-describing, machine-readable documents.

Results

Biotea is an approach to generate RDF from scholarly documents. Our RDF model makes extensive use of existing ontologies and semantic enrichment services. Our dataset comprises 270,834 articles from PubMed Open Central in RDF/XML distributed in 404 zipped files. The RDFization process takes care of metadata, e.g., title, authors and journal, as well as semantic annotations on biological entities along the full text. Biological entities are extracted by using the NCBO Annotator and Whatizit.

We use the Bibliographic Ontology (BIBO), Dublin Core Metadata Initiative Terms (DCMI-terms), and the Provenance Ontology (PROV-O) to model the bibliographic metadata. Links to related pages such as PubMed HTML articles are provided via rdfs:seeAlso while links to other semantic representation such as Bio2RDF PubMed articles are provided via owl:sameAs.

The NCBO Annotator is used to extract entities covering ChEBI for chemicals; Pathway, and Functional Genomics Data Society (MGED) for genes and proteins; Master Drug Data Base (MDDB), NDDF, and NDFRT for drugs; SNOMED, SYMP, MedDRA, MeSH, MedlinePlus Health Topics (MedlinePlus), Online Mendelian Inheritance in Man (OMIM), FMA, ICD10, and Ontology for Biomedical Investigations (OBI) for diseases and medical terms; PO for plants; and MeSH, SNOMED, and NCIt for general terms.

Whatizit is used for GO, UniProt proteins, UniProt Taxonomy, and diseases mapped to the UMLS; UniProt taxa are also mapped to NCBI Taxon vocabulary.

Conclusions

Biotea delivers models and tools for metadata enrichment and semantic processing of biomedical documents. Our dataset makes it easier to access to the first bunch of RDFized articles following the Biotea model. Our future plans include updating our dataset on regular basis in order to incorporate the latest articles added to the PubMed Open Central collection, next delivery is planned for the first half of 2017. Following datasets will support a mapping to the Semanticscience Integrated Ontology (SIO) in order to accomplish to the guidelines set by Bio2RDF.

Notes

Biotea approach in full is available at http://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-4-S1-S5 (Garcia Castro, L.J., C. McLaughlin, and A. Garcia, Biotea: RDFizing PubMed Central in Support for the Paper as an Interface to the Web of Data. Biomedical semantics, 2013. 4 Suppl 1: p. S5).

Biotea algorithms are publicly available at https://github.com/biotea
f
Data from: Augmentation of Semantic Processes for Deep Learning Applications...
tandf.figshare.com
txt
Updated Jun 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maximilian Hoffmann; Lukas Malburg; Ralph Bergmann (2025). Augmentation of Semantic Processes for Deep Learning Applications [Dataset]. http://doi.org/10.6084/m9.figshare.29212617.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29212617.v1
Dataset updated
Jun 2, 2025
Dataset provided by
Taylor & Francis
Authors
Maximilian Hoffmann; Lukas Malburg; Ralph Bergmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The popularity of Deep Learning (DL) methods used in business process management research and practice is constantly increasing. One important factor that hinders the adoption of DL in certain areas is the availability of sufficiently large training datasets, particularly affecting domains where process models are mainly defined manually with a high knowledge-acquisition effort. In this paper, we examine process model augmentation in combination with semi-supervised transfer learning to enlarge existing datasets and train DL models effectively. The use case of similarity learning between manufacturing process models is discussed. Based on a literature study of existing augmentation techniques, a concept is presented with different categories of augmentation from knowledge-light approaches to knowledge-intensive ones, e. g. based on automated planning. Specifically, the impacts of augmentation approaches on the syntactic and semantic correctness of the augmented process models are considered. The concept also proposes a semi-supervised transfer learning approach to integrate augmented and non-augmented process model datasets in a two-phased training procedure. The experimental evaluation investigates augmented process model datasets regarding their quality for model training in the context of similarity learning between manufacturing process models. The results indicate a large potential with a reduction of the prediction error of up to 53%.
R
Urban Semantic Dataset
universe.roboflow.com
zip
Updated Jan 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LabIA (2025). Urban Semantic Dataset [Dataset]. https://universe.roboflow.com/labia-z0pkg/urban-semantic/model/1
Explore at:
zipAvailable download formats
Dataset updated
Jan 17, 2025
Dataset authored and provided by
LabIA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Traffic Signs Streetlamp Bounding Boxes
Description
Urban Semantic

## Overview Urban Semantic is a dataset for object detection tasks - it contains Traffic Signs Streetlamp annotations for 7,107 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
S
A dataset of grape multimodal object detection and semantic segmentation
scidb.cn
Updated Aug 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenjun Chen; Yuan Rao; Fengyi Wang; Yu Zhang; Yumeng Yang; Qing Luo; Tong Zhang; Tianyu Wan; Xinyu Liu; Mengyu Zhang; Rui Zhang (2023). A dataset of grape multimodal object detection and semantic segmentation [Dataset]. http://doi.org/10.57760/sciencedb.j00001.00883
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.j00001.00883
Dataset updated
Aug 28, 2023
Dataset provided by
Science Data Bank
Authors
Wenjun Chen; Yuan Rao; Fengyi Wang; Yu Zhang; Yumeng Yang; Qing Luo; Tong Zhang; Tianyu Wan; Xinyu Liu; Mengyu Zhang; Rui Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The accuracy of grape picking point localization is dependent on grape detection and semantic segmentation network performance. However, in practical application scenarios, the accuracy and segmentation precision of grape targets based on visible light images are susceptible to light variations and complex environments, often performing poorly. Moreover, grapes grow in bunches, and the existing multimodal datasets for apples and pears can hardly meet the recognition needs of bunch-shaped grapes. The construction of visible, depth, and near-infrared multimodal object detection and semantic segmentation datasets of grapes is crucial to exploring better recognition rates and stronger generalization capabilities for grape detection and semantic segmentation models. This dataset, totaling about 39.08 GB, contains high-quality multimodal video stream data of green and purple grapes, including six varieties, under different illumination and obscuration conditions. Additionally, the dataset offers 3954 labeled image samples extracted from the aforementioned multimodal video. By means of rotation, deflation, mis-slicing, panning, and Gaussian blur, the dataset can be augmented for the training implementation of mainstream deep learning models. The dataset can provide valuable basic data resources for multimodal fusion, grape semantic segmentation, and object detection, which have important practical application value for promoting research in the field of agricultural machinery and equipment intelligence.
code_x_glue_cc_clone_detection_poj104
huggingface.co
opendatalab.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google, code_x_glue_cc_clone_detection_poj104 [Dataset]. https://huggingface.co/datasets/google/code_x_glue_cc_clone_detection_poj104
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Googlehttp://google.com/
License
https://choosealicense.com/licenses/c-uda/https://choosealicense.com/licenses/c-uda/
Description
Dataset Card for "code_x_glue_cc_clone_detection_poj_104"

Dataset Summary

CodeXGLUE Clone-detection-POJ-104 dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Clone-detection-POJ-104 Given a code and a collection of candidates as the input, the task is to return Top K codes with the same semantic. Models are evaluated by MAP score. We use POJ-104 dataset on this task.

Supported Tasks and Leaderboards

document-retrieval: The… See the full description on the dataset page: https://huggingface.co/datasets/google/code_x_glue_cc_clone_detection_poj104.
Aerial Semantic Drone Dataset
kaggle.com
Updated May 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lalu Erfandi Maula Yusnu (2021). Aerial Semantic Drone Dataset [Dataset]. https://www.kaggle.com/nunenuh/semantic-drone/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 25, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Lalu Erfandi Maula Yusnu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Aerial Semantic Drone Dataset

The Semantic Drone Dataset focuses on semantic understanding of urban scenes for increasing the safety of autonomous drone flight and landing procedures. The imagery depicts more than 20 houses from nadir (bird's eye) view acquired at an altitude of 5 to 30 meters above the ground. A high-resolution camera was used to acquire images at a size of 6000x4000px (24Mpx). The training set contains 400 publicly available images and the test set is made up of 200 private images.

This dataset is taken from https://www.kaggle.com/awsaf49/semantic-drone-dataset. We remove and add files and information that we needed for our research purpose. We create our tiff files with a resolution of 1200x800 pixel in 24 channel with each channel represent classes that have been preprocessed from png files label. We reduce the resolution and compress the tif files with tiffile python library.

If you have any problem with tif dataset that we have been modified you can contact nunenuh@gmail.com and gaungalif@gmail.com.

This dataset was a copy from the original dataset (link below), we provide and add some improvement in the semantic data and classes. There are the availability of semantic data in png and tiff format with a smaller size as needed.

Semantic Annotation

The images are labelled densely using polygons and contain the following 24 classes:

unlabeled paved-area dirt grass gravel water rocks pool vegetation roof wall window door fence fence-pole person dog car bicycle tree bald-tree ar-marker obstacle conflicting

Directory Structure and Files

> images > labels/png > labels/tiff - class_to_idx.json - classes.csv - classes.json - idx_to_class.json

Included Data

400 training images in jpg format can be found in "aerial_semantic_drone/images"

Dense semantic annotations in png format can be found in "aerial_semantic_drone/labels/png"

Dense semantic annotations in tiff format can be found in "aerial_semantic_drone/labels/tiff"

Semantic class definition in csv format can be found in "aerial_semantic_drone/classes.csv"

Semantic class definition in json can be found in "aerial_semantic_drone/classes.json"

Index to class name file can be found in "aerial_semantic_drone/idx_to_class.json"

Class name to index file can be found in "aerial_semantic_drone/idx_to_class.json"

Contact

aerial@icg.tugraz.at

Citation

If you use this dataset in your research, please cite the following URL: www.dronedataset.icg.tugraz.at

License

The Drone Dataset is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree:

That the dataset comes "AS IS", without express or implied warranty. Although every effort has been made to ensure accuracy, we (Graz University of Technology) do not accept any responsibility for errors or omissions. That you include a reference to the Semantic Drone Dataset in any work that makes use of the dataset. For research papers or other media link to the Semantic Drone Dataset webpage.

That you do not distribute this dataset or modified versions. It is permissible to distribute derivative works in as far as they are abstract representations of this dataset (such as models trained on it or additional annotations that do not directly include any of our data) and do not allow to recover the dataset or something similar in character. That you may not use the dataset or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain. That all rights not expressly granted to you are reserved by us (Graz University of Technology).
Data from: S1S2-Water: A global dataset for semantic segmentation of water...
zenodo.org
data.niaid.nih.gov
json, zip
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marc Wieland; Marc Wieland; Florian Fichtner; Sandro Martinis; Sandro Groth; Christian Krullikowski; Simon Plank; Mahdi Motagh; Florian Fichtner; Sandro Martinis; Sandro Groth; Christian Krullikowski; Simon Plank; Mahdi Motagh (2023). S1S2-Water: A global dataset for semantic segmentation of water bodies from Sentinel-1 and Sentinel-2 satellite images [Dataset]. http://doi.org/10.5281/zenodo.8314175
Explore at:
zip, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8314175
Dataset updated
Nov 22, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marc Wieland; Marc Wieland; Florian Fichtner; Sandro Martinis; Sandro Groth; Christian Krullikowski; Simon Plank; Mahdi Motagh; Florian Fichtner; Sandro Martinis; Sandro Groth; Christian Krullikowski; Simon Plank; Mahdi Motagh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The S1S2-Water dataset is a global reference dataset for training, validation and testing of convolutional neural networks for semantic segmentation of surface water bodies in publicly available Sentinel-1 and Sentinel-2 satellite images. The dataset consists of 65 triplets of Sentinel-1 and Sentinel-2 images with quality checked binary water mask. Samples are drawn globally on the basis of the Sentinel-2 tile-grid (100 x 100 km) under consideration of pre-dominant landcover and availability of water bodies. Each sample is complemented with metadata and Digital Elevation Model (DEM) raster from the Copernicus DEM.
This work was supported by the German Federal Ministry of Education and Research (BMBF) through the project "Künstliche Intelligenz zur Analyse von Erdbeobachtungs- und Internetdaten zur Entscheidungsunterstützung im Katastrophenfall" (AIFER) under Grant 13N15525, and by the Helmholtz Artificial Intelligence Cooperation Unit through the project "AI for Near Real Time Satellite-based Flood Response" (AI4FLOOD) under Grant ZT-IPF-5-39.
Cityscapes Image Pairs
kaggle.com
Updated Apr 20, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DanB (2018). Cityscapes Image Pairs [Dataset]. https://www.kaggle.com/datasets/dansbecker/cityscapes-image-pairs/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
DanB
Description
Context

Cityscapes data (dataset home page) contains labeled videos taken from vehicles driven in Germany. This version is a processed subsample created as part of the Pix2Pix paper. The dataset has still images from the original videos, and the semantic segmentation labels are shown in images alongside the original image. This is one of the best datasets around for semantic segmentation tasks.

Content

This dataset has 2975 training images files and 500 validation image files. Each image file is 256x512 pixels, and each file is a composite with the original photo on the left half of the image, alongside the labeled image (output of semantic segmentation) on the right half.

Acknowledgements

This dataset is the same as what is available here from the Berkeley AI Research group.

License

The Cityscapes data available from cityscapes-dataset.com has the following license:

This dataset is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree:

That the dataset comes "AS IS", without express or implied warranty. Although every effort has been made to ensure accuracy, we (Daimler AG, MPI Informatics, TU Darmstadt) do not accept any responsibility for errors or omissions.

That you include a reference to the Cityscapes Dataset in any work that makes use of the dataset. For research papers, cite our preferred publication as listed on our website; for other media cite our preferred publication as listed on our website or link to the Cityscapes website.

That you do not distribute this dataset or modified versions. It is permissible to distribute derivative works in as far as they are abstract representations of this dataset (such as models trained on it or additional annotations that do not directly include any of our data) and do not allow to recover the dataset or something similar in character.

That you may not use the dataset or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain.

That all rights not expressly granted to you are reserved by (Daimler AG, MPI Informatics, TU Darmstadt).

Inspiration

Can you identify you identify what objects are where in these images from a vehicle.
Z
Dataset - Clustering Semantic Predicates in the Open Research Knowledge...
data.niaid.nih.gov
zenodo.org
Updated Aug 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arab Oghli, Omar (2022). Dataset - Clustering Semantic Predicates in the Open Research Knowledge Graph [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6513498
Explore at:
Dataset updated
Aug 8, 2022
Dataset authored and provided by
Arab Oghli, Omar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset has been created for implementing a content-based recommender system in the context of the Open Research Knowledge Graph (ORKG). The recommender system accepts research paper's title and abstracts as input and recommends existing predicates in the ORKG semantically relevant to the given paper.

The paper instances in the dataset are grouped by ORKG comparisons and therefore the data.json file is more comprehensive than training_set.json and test_set.json.

data.json

The main JSON object consists of a list of comparisons. Each comparisons object has an ID, label, list of papers and list of predicates, whereas each paper object has ID, label, DOI, research field, research problems and abstract. Each predicate object has an ID and a label. See an example instance below.

{ "comparisons": [ { "id": "R108331", "label": "Analysis of approaches based on required elements in way of modeling", "papers": [ { "id": "R108312", "label": "Rapid knowledge work visualization for organizations", "doi": "10.1108/13673270710762747", "research_field": { "id": "R134", "label": "Computer and Systems Architecture" }, "research_problems": [ { "id": "R108294", "label": "Enterprise engineering" } ], "abstract": "Purpose \u2013 The purpose of this contribution is to motivate a new, rapid approach to modeling knowledge work in organizational settings and to introduce a software tool that demonstrates the viability of the envisioned concept.Design/methodology/approach \u2013 Based on existing modeling structures, the KnowFlow toolset that aids knowledge analysts in rapidly conducting interviews and in conducting multi\u2010perspective analysis of organizational knowledge work is introduced.Findings \u2013 This article demonstrates how rapid knowledge work visualization can be conducted largely without human modelers by developing an interview structure that allows for self\u2010service interviews. Two application scenarios illustrate the pressing need for and the potentials of rapid knowledge work visualizations in organizational settings.Research limitations/implications \u2013 The efforts necessary for traditional modeling approaches in the area of knowledge management are often prohibitive. This contribution argues that future research needs ..." }, .... ], "predicates": [ { "id": "P37126", "label": "activities, behaviours, means [for knowledge development and/or for knowledge conveyance and transformation" }, { "id": "P36081", "label": "approach name" }, .... ] }, .... ] }

training_set.json and test_set.json

The main JSON object consists of a list of training/test instances. Each instance has an instance_id with the format (comparison_id X paper_id) and a text. The text is a concatenation of the paper's label (title) and abstract. See an example instance below.

Note that test instances are not duplicated and do not occur in the training set. Training instances are also not duplicated, BUT training papers can be duplicated in a concatenation with different comparisons.

{ "instances": [ { "instance_id": "R108331xR108301", "comparison_id": "R108331", "paper_id": "R108301", "text": "A notation for Knowledge-Intensive Processes Business process modeling has become essential for managing organizational knowledge artifacts. However, this is not an easy task, especially when it comes to the so-called Knowledge-Intensive Processes (KIPs). A KIP comprises activities based on acquisition, sharing, storage, and (re)use of knowledge, as well as collaboration among participants, so that the amount of value added to the organization depends on process agents' knowledge. The previously developed Knowledge Intensive Process Ontology (KIPO) structures all the concepts (and relationships among them) to make a KIP explicit. Nevertheless, KIPO does not include a graphical notation, which is crucial for KIP stakeholders to reach a common understanding about it. This paper proposes the Knowledge Intensive Process Notation (KIPN), a notation for building knowledge-intensive processes graphical models." }, ... ] }

Dataset Statistics:

- Papers Predicates Research Fields Research Problems Min/Comparison 2 2 1 0 Max/Comparison 202 112 5 23 Avg./Comparison 21,54 12,79 1,20 1,09 Total 4060 1816 46 178

Dataset Splits:

- Papers Comparisons Training Set 2857 214 Test Set 1203 180
Datasets and Models for Historical Newspaper Article Segmentation
zenodo.org
explore.openaire.eu
json, txt, zip
Updated Jan 31, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raphaël Barman; Maud Ehrmann; Simon Clematide; Oliveira; Raphaël Barman; Maud Ehrmann; Simon Clematide; Oliveira (2021). Datasets and Models for Historical Newspaper Article Segmentation [Dataset]. http://doi.org/10.5281/zenodo.3706863
Explore at:
json, txt, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3706863
Dataset updated
Jan 31, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Raphaël Barman; Maud Ehrmann; Simon Clematide; Oliveira; Raphaël Barman; Maud Ehrmann; Simon Clematide; Oliveira
Description
This record contains the datasets and models used and produced for the work reported in the paper "Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers" (link).

Please cite this paper if you are using the models/datasets or find it relevant to your research:

@article{barman_combining_2020, title = {{Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers}}, author = {Raphaël Barman and Maud Ehrmann and Simon Clematide and Sofia Ares Oliveira and Frédéric Kaplan}, journal= {Journal of Data Mining \& Digital Humanities}, volume= {HistoInformatics} DOI = {10.5281/zenodo.4065271}, year = {2021}, url = {https://jdmdh.episciences.org/7097}, }

Please note that this record contains data under different licenses.

1. DATA

Annotations (json files): JSON files contains image annotations, with one file per newspaper containing region annotations (label and coordinates) in VIA format. The following licenses apply:

luxwort.json: those annotations are under a CC0 1.0 license. Please refer to the right statement specified for each image in the file.

GDL.json, IMP.json and JDG.json: those annotations are under a CC BY-SA 4.0 license.

Image files: The archive images.zip contains the Swiss titles image files (GDL, IMP, JDG) used for the experiments described in the paper. Those images are under copyright (property of the journal Le Temps and of ArcInfo) and can be used for academic research or educational purposes only. Redistribution, publication or commercial use are not permitted. These terms of use are similar to the following right statement: http://rightsstatements.org/vocab/InC-EDU/1.0/

2. MODELS

Some of the best models are released under a CC BY-SA 4.0 license (they are also available as assets of the current Github release).

JDG_flair-FT: this model was trained on JDG using french Flair and FastText embeddings. It is able to predict the four classes presented in the paper (Serial, Weather, Death notice and Stocks).

Luxwort_obituary_flair-bpemb: this model was trained on Luxwort using multilingual Flair and Byte-pair embeddings. It is able to predict the Death notice class.

Luxwort_obituary_flair-FT_indomain: this model was trained on Luxwort using in-domain Flair and FastText embeddings (trained on Luxwort data). It is also able to predict the Death notice class.

Those models can be used to predict probabilities on new images using the same code as in the original dhSegment repository. One needs to adjust three parameters to the predict function: 1) embeddings_path (the path to the embeddings list), 2) embeddings_map_path(the path to the compressed embedding map), and 3) embeddings_dim (the size of the embeddings).

Please refer to the paper for further information or contact us.

3. CODE:

https://github.com/dhlab-epfl/dhSegment-text

4. ACKNOWLEDGEMENTS
We warmly thank the journal Le Temps (owner of La Gazette de Lausanne and the Journal de Genève) and the group ArcInfo (owner of L'Impartial) for accepting to share the related datasets for academic purposes. We also thank the National Library of Luxembourg for its support with all steps related to the Luxemburger Wort annotation release.
This work was realized in the context of the impresso - Media Monitoring of the Past project and supported by the Swiss National Science Foundation under grant CR- SII5_173719.

5. CONTACT
Maud Ehrmann (EPFL-DHLAB)
Simon Clematide (UZH)
f
Datasets related to algorithms performance.
plos.figshare.com
xlsx
Updated Feb 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lin Jun; Zhou Chenliang (2025). Datasets related to algorithms performance. [Dataset]. http://doi.org/10.1371/journal.pone.0315143.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0315143.s001
Dataset updated
Feb 14, 2025
Dataset provided by
PLOS ONE
Authors
Lin Jun; Zhou Chenliang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The smart grid is on the basis of physical grid, introducing all kinds of advanced communications technology and form a new type of power grid. It can not only meet the demand of users and realize the optimal allocation of resources, but also improve the safety, economy and reliability of power supply, it has become a major trend in the future development of electric power industry. But on the other hand, the complex network architecture of smart grid and the application of various high-tech technologies have also greatly increased the probability of equipment failure and the difficulty of fault diagnosis, and timely discovery and diagnosis of problems in the operation of smart grid equipment has become a key measure to ensure the safety of power grid operation. From the current point of view, the existing smart grid equipment fault diagnosis technology has problems that the application program is more complex, and the fault diagnosis rate is generally not high, which greatly affects the efficiency of smart grid maintenance. Therefore, Based on this, this paper adopts the multimodal semantic model of deep learning and knowledge graph, and on the basis of the original target detection network YOLOv4 architecture, introduces knowledge graph to unify the characterization and storage of the input multimodal information, and innovatively combines the YOLOv4 target detection algorithm with the knowledge graph to establish a smart grid equipment fault diagnosis model. Experiments show that compared with the existing fault detection algorithms, the YOLOv4 algorithm constructed in this paper is more accurate, faster and easier to operate.
S
Data from: GEOSatDB: global civil earth observation satellite semantic...
scidb.cn
zenodo.org
Updated Oct 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ming Lin; Meng Jin; Juanzi Li; Yuqi Bai (2023). GEOSatDB: global civil earth observation satellite semantic database [Dataset]. http://doi.org/10.57760/sciencedb.11805
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.11805
Dataset updated
Oct 7, 2023
Dataset provided by
Science Data Bank
Authors
Ming Lin; Meng Jin; Juanzi Li; Yuqi Bai
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
GEOSatDB is a semantic representation of Earth observation satellites and sensors that can be used to easily discover available Earth observation resources for specific research objectives.BackgroundThe widespread availability of coordinated and publicly accessible Earth observation (EO) data empowers decision-makers worldwide to comprehend global challenges and develop more effective policies. Space-based satellite remote sensing, which serves as the primary tool for EO, provides essential information about the Earth and its environment by measuring various geophysical variables. This contributes significantly to our understanding of the fundamental Earth system and the impact of human activities.Over the past few decades, many countries and organizations have markedly improved their regional and global EO capabilities by deploying a variety of advanced remote sensing satellites. The rapid growth of EO satellites and advances in on-board sensors have significantly enhanced remote sensing data quality by expanding spectral bands and increasing spatio-temporal resolutions. However, users face challenges in accessing available EO resources, which are often maintained independently by various nations, organizations, or companies. As a result, a substantial portion of archived EO satellite resources remains underutilized. Enhancing the discoverability of EO satellites and sensors can effectively utilize the vast amount of EO resources that continue to accumulate at a rapid pace, thereby better supporting data for global change research.MethodologyThis study introduces GEOSatDB, a comprehensive semantic database specifically tailored for civil Earth observation satellites. The foundation of the database is an ontology model conforming to standards set by the International Organization for Standardization (ISO) and the World Wide Web Consortium (W3C). This conformity enables data integration and promotes the reuse of accumulated knowledge. Our approach advocates a novel method for integrating Earth observation satellite information from diverse sources. It notably incorporates a structured prompt strategy utilizing a large language model to derive detailed sensor information from vast volumes of unstructured text.Dataset InformationThe GEOSatDB portal(https://www.geosatdb.cn) has been developed to provide an interactive interface that facilitates the efficient retrieval of information on Earth observation satellites and sensors.The downloadable files in RDF Turtle format are located in the data directory and contain a total of 132,681 statements:- GEOSatDB_ontology.ttl: Ontology modeling of concepts, relations, and properties.- satellite.ttl: 2,453 Earth observation satellites and their associated entities.- sensor.ttl: 1,035 Earth observation sensors and their associated entities.- sensor2satellite.ttl: relations between Earth observation satellites and sensors.GEOSatDB undergoes quarterly updates, involving the addition of new satellites and sensors, revisions based on expert feedback, and the implementation of additional enhancements.
q
Radar Segmentation (RadSeg) Dataset
researchdatafinder.qut.edu.au
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zi Huang (2025). Radar Segmentation (RadSeg) Dataset [Dataset]. https://researchdatafinder.qut.edu.au/individual/n62585
Explore at:
Dataset updated
May 15, 2025
Dataset provided by
Queensland University of Technology (QUT)
Authors
Zi Huang
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
RadSeg is a synthetic radar dataset designed for building semantic segmentation models for radar activity recognition. Unlike existing radio classification datasets that only provide signal-wise annotations for short and isolated I/Q sequences, RadSeg provides sample-wise annotations for interleaved radar pulse activities that extend across a long time horizon. This makes RadSeg the first annotated public dataset of its kind for radar activity recognition.

Further information about the RadSeg dataset is available in our paper:

Z. Huang, A. Pemasiri, S. Denman, C. Fookes and T. Martin, Multi-Stage Learning for Radar Pulse Activity Segmentation, ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, pp. 7340-7344, doi: 10.1109/ICASSP48485.2024.10445810
p
MS-CXR: Making the Most of Text Semantics to Improve Biomedical...
physionet.org
Updated Nov 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benedikt Boecking; Naoto Usuyama; Shruthi Bannur; Daniel Coelho de Castro; Anton Schwaighofer; Stephanie Hyland; Harshita Sharma; Maria Teodora Wetscherek; Tristan Naumann; Aditya Nori; Javier Alvarez Valle; Hoifung Poon; Ozan Oktay (2024). MS-CXR: Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing [Dataset]. http://doi.org/10.13026/9g2z-jg61
Explore at:
Unique identifier
https://doi.org/10.13026/9g2z-jg61
Dataset updated
Nov 15, 2024
Authors
Benedikt Boecking; Naoto Usuyama; Shruthi Bannur; Daniel Coelho de Castro; Anton Schwaighofer; Stephanie Hyland; Harshita Sharma; Maria Teodora Wetscherek; Tristan Naumann; Aditya Nori; Javier Alvarez Valle; Hoifung Poon; Ozan Oktay
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
We release a new dataset, MS-CXR, with locally-aligned phrase grounding annotations by board-certified radiologists to facilitate the study of complex semantic modelling in biomedical vision–language processing. The MS-CXR dataset provides 1162 image–sentence pairs of bounding boxes and corresponding phrases, collected across eight different cardiopulmonary radiological findings, with an approximately equal number of pairs for each finding. This dataset complements the existing MIMIC-CXR v.2 dataset and comprises: 1. Reviewed and edited bounding boxes and phrases (1026 pairs of bounding box/sentence); and 2. Manual bounding box labels from scratch (136 pairs of bounding box/sentence).

This large, well-balanced phrase grounding benchmark dataset contains carefully curated image regions annotated with descriptions of eight radiology findings, as verified by radiologists. Unlike existing chest X-ray benchmarks, this challenging phrase grounding task evaluates joint, local image-text reasoning while requiring real-world language understanding, e.g. to parse domain-specific location references, complex negations, and bias in reporting style. This data accompany work showing that principled textual semantic modelling can improve contrastive learning in self-supervised vision–language processing.
f
Data from: Time-series China urban land use mapping (2016–2022): An approach...
figshare.com
zip
Updated Dec 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiong Shuping (2024). Time-series China urban land use mapping (2016–2022): An approach for achieving spatial-consistency and semantic-transition rationality in temporal domain [Dataset]. http://doi.org/10.6084/m9.figshare.27610683.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27610683.v3
Dataset updated
Dec 27, 2024
Dataset provided by
figshare
Authors
Xiong Shuping
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
If you want to use this data, please cite our article:Xiong, S., Zhang, X., Lei, Y., Tan, G., Wang, H., & Du, S. (2024). Time-series China urban land use mapping (2016–2022): An approach for achieving spatial-consistency and semantic-transition rationality in temporal domain. Remote Sensing of Environment, 312, 114344.The global urbanization trend is geographically manifested through city expansion and the renewal of internal urban structures and functions. Time-series urban land use (ULU) maps are vital for capturing dynamic land changes in the urbanization process, giving valuable insights into urban development and its environmental consequences. Recent studies have mapped ULU in some cities with a unified model, but ignored the regional differences among cities; and they generated ULU maps year by year, but ignored temporal correlations between years; thus, they could be weak in large-scale and long time-series ULU monitoring. Accordingly, we introduce an temporal-spatial-semantic collaborative (TSS) mapping framework to generating accurate ULU maps with considering regional differences and temporal correlations. Firstly, to support model training, a large-scale ULU sample dataset based on OpenStreetMap (OSM) and Sentinel-2 imagery is automatically constructed, providing a total number of 56,412 samples with a size of 512 × 512 which are divided into six sub-regions in China and used for training different classification models. Then, an urban land use mapping network (ULUNet) is proposed to recognize ULU. This model utilizes a primary and an auxiliary encoder to process noisy OSM samples and can enhance the model's robustness under noisy labels. Finally, taking the temporal correlations of ULU into consideration, the recognized ULU are optimized, whose boundaries are unified by a time-series co-segmentation, and whose categories are modified by a knowledge-data driven method. To verify the effectiveness of the proposed method, we consider all urban areas in China (254,566 km2), and produce a time-series China urban land use dataset (CULU) at a 10-m resolution, spanning from 2016 to 2022, with an overall accuracy of CULU is 82.42%. Through comparison, it can be found that CULU outperforms existing datasets such as EULUC-China and UFZ-31cities in data accuracies, spatial boundaries consistencies and land use transitions logicality. The results indicate that the proposed method and generated dataset can play important roles in land use change monitoring, ecological-environmental evolution analysis, and also sustainable city development.
Z
Data from: Intelligent Energy Systems Ontology: Local flexibility market and...
data.niaid.nih.gov
portalcienciaytecnologia.jcyl.es
+2more
Updated Nov 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pinto, Tiago (2023). Intelligent Energy Systems Ontology: Local flexibility market and power system co-simulation demonstration [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5526902
Explore at:
Dataset updated
Nov 15, 2023
Dataset provided by
Santos, Gabriel
Pinto, Tiago
Morais, Hugo
Corchado, Juan M.
Vale, Zita
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Intelligent Energy Systems Ontology (IESO) provides semantic interoperability within a society of multi-agent systems (MAS) developed in the scope of power and energy systems (PES). It leverages the knowledge from existing and publicly available semantic models developed for specific PES subdomains to accomplish a shared vocabulary among the agents of the MAS community, overcoming heterogeneity among the reused ontologies. IESO provides agents with semantic reasoning, constraints validation, and data uniformization. The use of IESO is demonstrated through the simulation of the management of a rural distribution network, considering the validation of the grid’s technical constraints. This dataset publishes files demonstrating: i) a snapshot of the initial semantic knowledge base (KB); ii) queries to the KB to get services inputs; iii) conversions between syntactic and semantic models; iv) constraints validations; v) automatic conversion of units of measure.

Facebook

Twitter

Click to copy link

Link copied

Cite

Terry Ruas (2020). Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles (Dataset, Models & Code) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3713182

Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles (Dataset, Models & Code)

Explore at:

Dataset updated

Aug 1, 2020

Dataset provided by

Moritz Schubotz
Bela Gipp
Malte Ostendorff
Georg Rehm
Terry Ruas

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Many digital libraries recommend literature to their users considering the similarity between a query document and their repository. However, they often fail to distinguish what is the relationship that makes two documents alike. In this paper, we model the problem of finding the relationship between two documents as a pairwise document classification task. To find the semantic relation between documents, we apply a series of techniques, such as GloVe, Paragraph-Vectors, BERT, and XLNet under different configurations (e.g., sequence length, vector concatenation scheme), including a Siamese architecture for the Transformer-based systems. We perform our experiments on a newly proposed dataset of 32,168 Wikipedia article pairs and Wikidata properties that define the semantic document relations. Our results show vanilla BERT as the best performing system with an F1-score of 0.93, which we manually examine to better understand its applicability to other domains. Our findings suggest that classifying semantic relations between documents is a solvable task and motivates the development of recommender systems based on the evaluated techniques. The discussions in this paper serve as first steps in the exploration of documents through SPARQL-like queries such that one could find documents that are similar in one aspect but dissimilar in another.

Additional information can be found on GitHub.

The following data is supplemental to the experiments described in our research paper. The data consists of:

Datasets (articles, class labels, cross-validation splits)

Pretrained models (Transformers, GloVe, Doc2vec)

Model output (prediction) for the best performing models

Dataset

The Wikipedia article corpus is available in enwiki-20191101-pages-articles.weighted.10k.jsonl.bz2. The original data have been downloaded as XML dump, and the corresponding articles were extracted as plain-text with gensim.scripts.segment_wiki. The archive contains only articles that are available in training or test data.

The actual dataset is provided as used in the stratified k-fold with k=4 in train_testdata_4folds.tar.gz.

├── 1 │ ├── test.csv │ └── train.csv ├── 2 │ ├── test.csv │ └── train.csv ├── 3 │ ├── test.csv │ └── train.csv └── 4 ├── test.csv └── train.csv

4 directories, 8 files

Pretrained models

PyTorch: vanilla and Siamese BERT + XLNet

Pretrained model for each fold is available in the corresponding model archives:

Vanilla

model_wiki.bert_base_joint_seq512.tar.gz model_wiki.xlnet_base_joint_seq512.tar.gz

Siamese

model_wiki.bert_base_siamese_seq512_4d.tar.gz model_wiki.xlnet_base_siamese_seq512_4d.tar.gz

Clear search

Close search

Google apps

Main menu

Pairwise Multi-Class Document Classification for Semantic Relations between...

Vanilla

Siamese

Fyp Semantic Dataset

Fyp Semantic

Dataset Corrosion Seg Semantic Dataset

Dataset Corrosion Seg Semantic

sick

Biotea dataset (vr. July 2012)

Data from: Augmentation of Semantic Processes for Deep Learning Applications...

Urban Semantic Dataset

Urban Semantic

A dataset of grape multimodal object detection and semantic segmentation

code_x_glue_cc_clone_detection_poj104

Aerial Semantic Drone Dataset

Aerial Semantic Drone Dataset

Semantic Annotation

Directory Structure and Files

Included Data

Contact

Citation

License

Data from: S1S2-Water: A global dataset for semantic segmentation of water...

Cityscapes Image Pairs

Context

Content

Acknowledgements

License

Inspiration

Dataset - Clustering Semantic Predicates in the Open Research Knowledge...

Datasets and Models for Historical Newspaper Article Segmentation

Datasets related to algorithms performance.

Data from: GEOSatDB: global civil earth observation satellite semantic...

Radar Segmentation (RadSeg) Dataset

MS-CXR: Making the Most of Text Semantics to Improve Biomedical...

Data from: Time-series China urban land use mapping (2016–2022): An approach...

Data from: Intelligent Energy Systems Ontology: Local flexibility market and...

Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles (Dataset, Models & Code)See More Versions

Vanilla

Siamese

Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles (Dataset, Models & Code)