19 datasets found

P
Electric Wires Dataset Dataset
paperswithcode.com
Updated Apr 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Electric Wires Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/electric-wires-dataset
Explore at:
Dataset updated
Apr 17, 2025
Description
Description:

👉 Download the dataset here

The Electric Wires Dataset is a top-notch, automatically generate resource design specifically for the semantic segmentation of cable-like objects, with a special emphasis on electric wires. This versatile dataset is built to be domain-independent, making it suitable for a wide range of industrial applications. Whether in construction, industrial manufacturing, power distribution, or communication infrastructure, this dataset is tailor to meet the needs of sectors where accurately recognizing wires and similar objects is crucial.

Dataset Generation Process:

The Electric Wires Dataset is created using a unique procedure that ensures both precision and consistency across all images. The process starts by placing the target object, electric wires, against a monochromatic background. This method allows for easy removal of the background using the chroma-key technique. As a result, clear and accurate training masks are generated for the target object.

Once the masks are generated, they can be combined with various backgrounds to produce a domain-independent dataset. This approach significantly reduces the reality gap, ensuring that the dataset remains applicable across different real-world scenarios. The process also includes extensive augmentation of the foreground images, enhancing the dataset's robustness and adaptability.

Download Dataset

Key Features:

High-Quality Annotations: The dataset provides precise segmentation masks for electric wires, enabling accurate training of semantic segmentation models.

Domain-Independence: By incorporating various backgrounds, the dataset is design to be used across multiple domains without the need for extensive domain-specific adjustments.

Chroma-Key Technique: Utilizes the chroma-key technique to ensure clean and accurate separation of the target objects from the background.

Augmentation: Includes a wide range of augment images, increasing the dataset's diversity and improving model generalization.

Versatile Applications: Ideal for training models used in construction, industrial manufacturing, power distribution, and communication infrastructure, where wire recognition is essential.

Applications:

This dataset is particularly beneficial for developing Al models in the following areas:

Industrial Automation: Improving the accuracy of robotic systems in recognizing and handling wires during assembly and manufacturing processes.

Safety Monitoring: Enhancing surveillance systems to detect and monitor electric wires in various environments, reducing risks associated with electrical hazards.

Infrastructure Maintenance: Assisting in the inspection and maintenance of power distribution networks and communication lines by accurately identifying wires in complex environments.

Augmented Reality: Facilitating the development of AR systems that require precise recognition of wires for overlaying relevant information in industrial settings.

Conclusion:

The Electric Wires Dataset is a highly versatile and essential tool for training semantic segmentation models, particularly those focused on recognizing cable-like objects. With high-quality annotations and extensive validation, this dataset serves as a reliable resource for industries that need precise wire detection and segmentation. Moreover, its adaptability makes it valuable across various applications, ensuring accurate results in different contexts.

This dataset is sourced from Kaggle
k
Industrial Machine Tool Element Surface Defect Dataset
radar.kit.edu
radar-service.eu
tar
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tobias Schlagenhauf; Magnus Landwehr; Jürgen Fleischer (2023). Industrial Machine Tool Element Surface Defect Dataset [Dataset]. http://doi.org/10.35097/1278
Explore at:
tar(121882112 bytes)Available download formats
Unique identifier
https://doi.org/10.35097/1278
Dataset updated
Jun 21, 2023
Dataset provided by
Fleischer, Jürgen
Karlsruhe Institute of Technology
Schlagenhauf, Tobias
Landwehr, Magnus
Authors
Tobias Schlagenhauf; Magnus Landwehr; Jürgen Fleischer
Description
The dataset contains 1104 channel 3 images with 394 image-annotations for the surface damage type “pitting”. The annotations made with the annotation tool labelme, are available in JSON format and hence convertible to VOC and COCO format. All images come from two BSD types. The dataset available for download is divided into two folders, data with all images as JPEG, label with all annotations, and saved_model with a baseline model. The authors also provide a python script to divide the data and labels into three different split types – train_test_split, which splits images into the same train and test data-split the authors used for the baseline model, wear_dev_split, which creates all 27 wear developments and type_split, which splits the data into the occurring BSD-types. One of the two mentioned BSD types is represented with 69 images and 55 different image-sizes. All images with this BSD type come either in a clean or soiled condition. The other BSD type is shown on 325 images with two image-sizes. Since all images of this type have been taken with continuous time the degree of soiling is evolving. Also, the dataset contains as above mentioned 27 pitting development sequences with every 69 images. Instruction dataset split The authors of this dataset provide 3 types of different dataset splits. To get the data split you have to run the python script split_dataset.py. Script inputs: split-type (mandatory) output directory (mandatory) Different split-types: train_test_split: splits dataset into train and test data (80%/20%) wear_dev_split: splits dataset into 27 wear-developments type_split: splits dataset into different BSD types Example: C:\Users\Desktop>python split_dataset.py --split_type=train_test_split --output_dir=BSD_split_folder Result: ./BSD_slit_folder/train/ and ./BSD_slit_folder/test/
Z
SSHOC - National Gallery - Grounds Database CIDOC CRM Mapped Dataset
data.niaid.nih.gov
dataverse.nl
+1more
Updated Jul 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph Padfield (2024). SSHOC - National Gallery - Grounds Database CIDOC CRM Mapped Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6478779
Explore at:
Dataset updated
Jul 16, 2024
Dataset provided by
Orla Delaney
Joseph Padfield
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
In 2018 the IPERION-CH Grounds Database was presented to examine how the data produced through the scientific examination of historic painting preparation or grounds samples, from multiple institutions could be combined in a flexible digital form. Exploring the presentation of interrelated high resolution images, text, complex metadata and procedural documentation. The original main user interface is live, though password protected at this time. Work within the SSHOC project aimed to reformat the data to create a more FAIR data-set, so in addition to mapping it to a standard ontology, to increase Interoperability, it has also been made available in the form of open linkable data combined with a SPARQL end-point. A draft version of this live data presentation can been found Here.

This is a draft data-set and further work is planned to debug and improve its semantic structure.This deposit contains the CIDOC-CRM mapped data formatted in XML and an example model diagram representing some of the key relationships covered in the data-set.
d
Semantic Knowledge Representation API
catalog.data.gov
healthdata.gov
+2more
Updated Jun 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). Semantic Knowledge Representation API [Dataset]. https://catalog.data.gov/dataset/semantic-knowledge-representation-api
Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
National Library of Medicine
Description
The SKR Project was initiated at NLM in order to develop programs to provide usable semantic representation of biomedical free text by building on resources currently available at the library. The SKR project is concerned with reliable and effective management of the information encoded in natural language texts. The project develops programs that provide usable semantic representation of biomedical text by building on resources currently available at the Library, especially the UMLS knowledge sources and the natural language processing tools provided by the SPECIALIST system. This Java-based API to the Semantic Knowledge Representation (SKR) Scheduler facility was created to provide users with the ability to programmatically submit jobs to the Scheduler Batch and Interactive facilities instead of using the Web-based interface.
a
Data from: RELLIS-3D Dataset: Data, Benchmarks and Analysis
academictorrents.com
bittorrent
Updated Dec 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiang, Peng and Osteen, Philip and Wigness, Maggie and Saripalli, Srikanth (2024). RELLIS-3D Dataset: Data, Benchmarks and Analysis [Dataset]. https://academictorrents.com/details/4cfa80e6d91e8c6c79bcc2f405dbd9255b5cf4e8
Explore at:
bittorrent(635808536706)Available download formats
Dataset updated
Dec 25, 2024
Dataset authored and provided by
Jiang, Peng and Osteen, Philip and Wigness, Maggie and Saripalli, Srikanth
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Semantic scene understanding is crucial for robust and safe autonomous navigation, particularly so in off-road environments. Recent deep learning advances for 3D semantic segmentation rely heavily on large sets of training data, however existing autonomy datasets either represent urban environments or lack multimodal off-road data. We fill this gap with RELLIS-3D, a multimodal dataset collected in an off-road environment, which contains annotations for 13,556 LiDAR scans and 6,235 images. The data was collected on the Rellis Campus of Texas A&M University, and presents challenges to existing algorithms related to class imbalance and environmental topography. Additionally, we evaluate the current state of the art deep learning semantic segmentation models on this dataset. Experimental results show that RELLIS-3D presents challenges for algorithms designed for segmentation in urban environments. This novel dataset provides the resources needed by researchers to continue to develop mo
CORD-19 Dataset v2020
kaggle.com
Updated Oct 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SMLRA-KJSCE (2020). CORD-19 Dataset v2020 [Dataset]. https://www.kaggle.com/datasets/smlrakjsce/cord19-dataset-v2020/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 18, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
SMLRA-KJSCE
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Open-Ended track where your team can build anything using the dataset provided by us

Dataset Description In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 200,000 scholarly articles, including over 100,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.

Call to Action We are issuing a call to action to the world's artificial intelligence experts to develop text and data mining tools that can help the medical community develop answers to high priority scientific questions. The CORD-19 dataset represents the most extensive machine-readable coronavirus literature collection available for data mining to date. This allows the worldwide AI research community the opportunity to apply text and data mining approaches to find answers to questions within, and connect insights across, this content in support of the ongoing COVID-19 response efforts worldwide. There is a growing urgency for these approaches because of the rapid increase in coronavirus literature, making it difficult for the medical community to keep up.

Many of the questions are suitable for text mining, and we encourage researchers to develop text mining tools to provide insights on these questions.We are maintaining a summary of the community's contributions.

Acknowledgements We wouldn't be here without the help of others. The datset is a subset of the dataset available at AI2's Semantic Scholar - https://pages.semanticscholar.org/coronavirus-research This dataset was created by the Allen Institute for AI in partnership with the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, IBM, and the National Library of Medicine - National Institutes of Health, in coordination with The White House Office of Science and Technology Policy. Dataset The dataset is in tar.gz format and can be downloaded from - https://drive.google.com/file/d/15SV8_Nc1HECN9uaplDSQx7H1yKFR4F_Z/view?usp=sharing

Submissions Notebook and Output results are expected as appropriate submissions.
P
Cityscapes Dataset
paperswithcode.com
opendatalab.com
Updated May 19, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marius Cordts; Mohamed Omran; Sebastian Ramos; Timo Rehfeld; Markus Enzweiler; Rodrigo Benenson; Uwe Franke; Stefan Roth; Bernt Schiele (2020). Cityscapes Dataset [Dataset]. https://paperswithcode.com/dataset/cityscapes
Explore at:
Dataset updated
May 19, 2020
Authors
Marius Cordts; Mohamed Omran; Sebastian Ramos; Timo Rehfeld; Markus Enzweiler; Rodrigo Benenson; Uwe Franke; Stefan Roth; Bernt Schiele
Description
Cityscapes is a large-scale database which focuses on semantic understanding of urban street scenes. It provides semantic, instance-wise, and dense pixel annotations for 30 classes grouped into 8 categories (flat surfaces, humans, vehicles, constructions, objects, nature, sky, and void). The dataset consists of around 5000 fine annotated images and 20000 coarse annotated ones. Data was captured in 50 cities during several months, daytimes, and good weather conditions. It was originally recorded as video so the frames were manually selected to have the following features: large number of dynamic objects, varying scene layout, and varying background.
f
Data from: FCG-MFD: Benchmark Function Call Graph-Based Dataset for Malware...
figshare.com
zip
Updated Jan 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hassan jalil hadi (2025). FCG-MFD: Benchmark Function Call Graph-Based Dataset for Malware Family Detection [Dataset]. http://doi.org/10.6084/m9.figshare.26886148.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26886148.v1
Dataset updated
Jan 8, 2025
Dataset provided by
figshare
Authors
Hassan jalil hadi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cyber crimes related to malware families are on the rise. This growth persists despite the prevalence of various antivirus software and approaches for malware detection and classification. Security experts have implemented Machine Learning (ML) techniques to identify these cyber-crimes. However, these approaches demand updated malware datasets for continuous improvements amid the evolving sophistication of malware strains. Thus, we present the FCG-MFD, a benchmark dataset with extensive Function Call Graphs (FCG) for malware family detection. This dataset guarantees resistance against emerging malware families by enabling security systems. Our dataset has two sub-datasets (FCG & Metadata) (1,00,000 samples) from VirusSamples, Virusshare, VirusSign, theZoo, Vx-underground, and MalwareBazaar curated using FCGs and metadata to optimize the efficacy of ML algorithms. We suggest a new malware analysis technique using FCGs and graph embedding networks, offering a solution to the complexity of feature engineering in ML-based malware analysis. Our approach to extracting semantic features via the Natural Language Processing (NLP) method is inspired by tasks involving sentences and words, respectively, for functions and instructions. We leverage a node2vec mechanism-based graph embedding network to generate malware embedding vectors. These vectors enable automated and efficient malware analysis by combining structural and semantic features. We use two datasets (FCG & Metadata) to assess FCG-MFD performance. F1-Scores of 99.14% and 99.28% are competitive with State-of-the-art (SOTA) methods.
Data from: What Makes Sentences Semantically Related? A Textual Relatedness...
zenodo.org
data.niaid.nih.gov
bin, csv, pdf, zip
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Abdalla; Mohamed Abdalla; Krishnapriya Vishnubhotla; Saif M. Mohammad; Saif M. Mohammad; Krishnapriya Vishnubhotla (2024). What Makes Sentences Semantically Related? A Textual Relatedness Dataset and Empirical Study [Dataset]. http://doi.org/10.5281/zenodo.7599667
Explore at:
pdf, bin, zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7599667
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mohamed Abdalla; Mohamed Abdalla; Krishnapriya Vishnubhotla; Saif M. Mohammad; Saif M. Mohammad; Krishnapriya Vishnubhotla
Description
What Makes Sentences Semantically Related? A Textual Relatedness Dataset and Empirical Study

This repository contains data and code for the paper What Makes Sentences Semantically Related: A Textual Relatedness Dataset and Empirical Study.

We hope that this work will spur further research on understanding sentence--sentence relatedness, methods of sentence representation, measures of semantic relatedness, and their applications.

Citing our work
Please use the following BibTex entry to cite us if you use our dataset or any of the associated analyses:

@inproceedings{abdalla2023makes,
title={What Makes Sentences Semantically Related: A Textual Relatedness Dataset and Empirical Study},
author={Abdalla, Mohamed and Vishnubhotla, Krishnapriya and Mohammad, Saif M.},
year={2023},
address = {Dubrovnik, Croatia},
publisher = "Association for Computational Linguistics",
booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume"
}

Dataset Description

The dataset consists of 5500 English sentence pairs that are scored and ranked on a relatedness scale ranging from 0 (least related) to 1 (most related).

Why Semantic Relatedness?
Closeness of meaning can be of two kinds: semantic relatedness and semantic similarity. Two sentences are considered semantically similar when they have a paraphrasal or entailment relation, whereas relatedness accounts for all of the commonalities that can exist between two sentences. Semantic relatedness is central to textual coherence and narrative structure. Automatically determining semantic relatedness has many applications such as question answering, plagiarism detection, text generation (say in personal assistants and chat bots), and summarization.

Prior NLP work has focused on semantic similarity (a small subset of semantic relatedness), largely because of a dearth of datasets. In this paper, we present the first manually annotated dataset of sentence--sentence semantic relatedness. It includes fine-grained scores of relatedness from 0 (least related) to 1 (most related) for 5,500 English sentence pairs. The sentences are taken from diverse sources and thus also have diverse sentence structures, varying amounts of lexical overlap, and varying formality.

Comparative Annotations and Best-Worst Scaling
Most existing sentence-sentence similarity datasets were annotated, one item at a time, using coarse rating labels such as integer values between 1 and 5 @ representing coarse degrees of closeness. It is well documented that such approaches suffer from inter- and intra-annotator inconsistency, scale region bias, and issues arising due to the fixed granularity.

The relatedness scores for our dataset were, instead, obtained using a comparative annotation schema. In comparative annotations, two (or more) items are presented together and the annotator has to determine which is greater with respect to the metric of interest.

Specifically, we use Best-Worst Scaling, a comparative annotation method}, which has been shown to produce reliable scores with fewer annotations in other NLP tasks. We use scripts from https://saifmohammad.com/WebPages/BestWorst.html to obtain relatedness scores from our annotations.

Loading the Dataset
- The sentence pairs, and associated scores, are in the file sem_text_rel_ranked.csv in the root directory. The CSV file can be read using:

python import pandas as pd str = pd.read_csv('sem_text_rel_ranked.csv') row = str.loc[0] sent1, sent2 = row['Text'].split(" ") score = row['Score']

- Relevant columns:

- Text: Sentence pair, separated by the newline character.
- Score: The semantic relatedness score between 0 and 1.

- Additionally:
- the SourceID column indicates the source dataset from which the sentence pair was drawn (see Table 2 of our paper)
- The SubsetID column indicates the sampling strategy used for the source dataset
- and the PairID is a unique identifier for each pair that also indicates its Source and Subset.

Raw Annotations from Amazon Mechanical Turk

- The `mturk_data/` subdirectory provides the raw MTurk annotations obtained with our comparative annotation setup.
- Each row of `mturk_data/bws_annotations.csv` consists of four sentence pairs along with human annotations for the most related (column `BestItem`) and the least related (column `WorstItem`) pair.
- File `mturk_data/id2sents.csv` pairs each sentence pair with the corresponding SourceID, SubsetID, and PairID that indicates the source dataset (see Table 2 of our paper).
- See file `mturk_data/task_intructions.txt` for the instructions provided to annotators for our task.

Datasheet for STR-2022
The datasheet for our dataset is in the document `STR2022-datastatement.pdf` in the root folder of this repository.

Ethics Statement
Any dataset of semantic relatedness entails several ethical considerations. We talk about this in Section 8 of our paper.

Creators
- Mohamed Abdalla (University of Toronto)
- Krishnapriya Vishnubhotla (University of Toronto)
- Saif M. Mohammad (National Research Council Canada)

Contact: msa@cs.toronto.edu, vkpriya@cs.toronto.edu, saif.mohammad@nrc-cnrc.gc.ca
COVID-19 Open Research Dataset (CORD-19)
zenodo.org
live.european-language-grid.eu
application/gzip, bin +3
Updated Jul 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sebastian Kohlmeier; Kyle Lo; Lucy Lu Wang; JJ Yang; Sebastian Kohlmeier; Kyle Lo; Lucy Lu Wang; JJ Yang (2024). COVID-19 Open Research Dataset (CORD-19) [Dataset]. http://doi.org/10.5281/zenodo.3765923
Explore at:
pdf, application/gzip, txt, csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3765923
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sebastian Kohlmeier; Kyle Lo; Lucy Lu Wang; JJ Yang; Sebastian Kohlmeier; Kyle Lo; Lucy Lu Wang; JJ Yang
Description
Important: This dataset is updated regularly and the latest version for download can be found here.

In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of scholarly articles, including full text content, about COVID-19 and the coronavirus family of viruses for use by the global research community.

This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease. The corpus will be updated weekly as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others.

By downloading this dataset you are agreeing to the Dataset license. Specific licensing information for individual articles in the dataset is available in the metadata file.

Additional licensing information is available on the PMC website, medRxiv website and bioRxiv website.

Dataset content:

Commercial use subset

Non-commercial use subset

PMC custom license subset

bioRxiv/medRxiv subset (pre-prints that are not peer reviewed)

Metadata file

Readme

Each paper is represented as a single JSON object (see schema file for details).

Description:

The dataset contains all COVID-19 and coronavirus-related research (e.g. SARS, MERS, etc.) from the following sources:

PubMed's PMC open access corpus using this query (COVID-19 and coronavirus research)

Additional COVID-19 research articles from a corpus maintained by the WHO

bioRxiv and medRxiv pre-prints using the same query as PMC (COVID-19 and coronavirus research)

We also provide a comprehensive metadata file of coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic and the WHO COVID-19 database of publications (includes articles without open access full text).

We recommend using metadata from the comprehensive file when available, instead of parsed metadata in the dataset. Please note the dataset may contain multiple entries for individual PMC IDs in cases when supplementary materials are available.

This repository is linked to the WHO database of publications on coronavirus disease and other resources, such as Microsoft Academic Graph, PubMed, and Semantic Scholar. A coalition including the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, and the National Library of Medicine of the National Institutes of Health came together to provide this service.

Citation:

When including CORD-19 data in a publication or redistribution, please cite our arXiv pre-print.

The Allen Institute for AI and particularly the Semantic Scholar team will continue to provide updates to this dataset as the situation evolves and new research is released.
RDF Databases Software Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). RDF Databases Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-rdf-databases-software-market
Explore at:
pptx, csv, pdfAvailable download formats
Dataset updated
Sep 22, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
RDF Databases Software Market Outlook

The RDF Databases Software market is experiencing a notable growth trajectory, with the global market size poised to grow from USD 1.5 billion in 2023 to an estimated USD 3.2 billion by 2032, reflecting a compound annual growth rate (CAGR) of 8.5%. This significant growth is driven by increasing data volumes, the necessity of efficient data management solutions, and the rising adoption of semantic web technologies. The growing demand for effective data integration and metadata management across various industries is fueling the expansion of the RDF Databases Software market globally.

One of the primary growth factors for the RDF Databases Software market is the exploding volume of data generated by enterprises and the need for sophisticated data management systems. RDF (Resource Description Framework) databases are pivotal in enabling efficient data integration and facilitating advanced analytics by providing a structured format for data storage. The increasing investments in big data and analytics are propelling the adoption of RDF databases as they offer superior capabilities in handling complex and heterogeneous data sources. Furthermore, the rise in digital transformation initiatives across industries necessitates robust database solutions, thereby driving market growth.

Another significant growth factor is the widespread adoption of semantic web technologies. RDF databases are integral to the semantic web as they provide a standardized way to describe and interlink data. This capability is crucial for enhancing data interoperability and enabling more intelligent and context-aware applications. Industries such as healthcare, finance, and retail are increasingly leveraging RDF databases to improve data integration, enhance decision-making processes, and deliver personalized customer experiences. The inherent flexibility and scalability of RDF databases make them an attractive choice for organizations aiming to harness the full potential of their data assets.

The growing emphasis on regulatory compliance and data governance is also contributing to the market expansion. With stringent data protection regulations such as GDPR and CCPA, organizations are compelled to implement robust data management practices. RDF databases, with their ability to maintain comprehensive metadata and provide detailed data lineage, are becoming essential tools for ensuring compliance and enhancing data governance frameworks. This trend is particularly prominent in industries like BFSI and healthcare, where data integrity and security are paramount.

Regionally, North America holds a significant share of the RDF Databases Software market, driven by the early adoption of advanced technologies and the presence of major market players. The region's well-established IT infrastructure and the high demand for data-driven decision-making solutions are key factors promoting market growth. Other regions, such as Europe and Asia Pacific, are also witnessing substantial growth owing to increasing digitalization efforts and the surging need for efficient data management systems. The Asia Pacific region, in particular, is expected to exhibit the highest CAGR during the forecast period, fueled by rapid technological advancements and the expansion of cloud-based services.

Component Analysis

The RDF Databases Software market is segmented into two primary components: Software and Services. The software segment encompasses various RDF database management systems and tools that facilitate efficient data storage, retrieval, and querying. This segment is crucial for organizations aiming to leverage semantic web technologies and improve data interoperability. The software component is witnessing robust growth due to the rising demand for scalable and flexible database solutions that can handle complex and diverse data sets. Additionally, advancements in software capabilities, such as enhanced query performance and improved scalability, are driving the adoption of RDF database software across industries.

Within the software segment, Open Source and Commercial software categories further delineate the market. Open Source RDF databases are gaining traction due to their cost-effectiveness and the growing preference for community-supported solutions. On the other hand, commercial RDF database software offers advanced features, dedicated support, and enterprise-grade security, making them suitable for large organizations with stringent data management requirements. The continuous development and innovation in RDF database software are expected to drive this segment's
f
DataSheet3_TextNetTopics Pro, a topic model-based text classification for...
frontiersin.figshare.com
xlsx
Updated Oct 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Voskergian; Burcu Bakir-Gungor; Malik Yousef (2023). DataSheet3_TextNetTopics Pro, a topic model-based text classification for short text by integration of semantic and document-topic distribution information.xlsx [Dataset]. http://doi.org/10.3389/fgene.2023.1243874.s003
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2023.1243874.s003
Dataset updated
Oct 5, 2023
Dataset provided by
Frontiers
Authors
Daniel Voskergian; Burcu Bakir-Gungor; Malik Yousef
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the exponential growth in the daily publication of scientific articles, automatic classification and categorization can assist in assigning articles to a predefined category. Article titles are concise descriptions of the articles’ content with valuable information that can be useful in document classification and categorization. However, shortness, data sparseness, limited word occurrences, and the inadequate contextual information of scientific document titles hinder the direct application of conventional text mining and machine learning algorithms on these short texts, making their classification a challenging task. This study firstly explores the performance of our earlier study, TextNetTopics on the short text. Secondly, here we propose an advanced version called TextNetTopics Pro, which is a novel short-text classification framework that utilizes a promising combination of lexical features organized in topics of words and topic distribution extracted by a topic model to alleviate the data-sparseness problem when classifying short texts. We evaluate our proposed approach using nine state-of-the-art short-text topic models on two publicly available datasets of scientific article titles as short-text documents. The first dataset is related to the Biomedical field, and the other one is related to Computer Science publications. Additionally, we comparatively evaluate the predictive performance of the models generated with and without using the abstracts. Finally, we demonstrate the robustness and effectiveness of the proposed approach in handling the imbalanced data, particularly in the classification of Drug-Induced Liver Injury articles as part of the CAMDA challenge. Taking advantage of the semantic information detected by topic models proved to be a reliable way to improve the overall performance of ML classifiers.
d
Data from: Measuring semantic memory using associative and dissociative...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Jan 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Marko; DrahomÃr Michalko; Adam Kubinec; Igor RieÄ anskÃ½ (2024). Measuring semantic memory using associative and dissociative retrieval tasks [Dataset]. http://doi.org/10.5061/dryad.vdncjsz1f
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.vdncjsz1f
Dataset updated
Jan 29, 2024
Dataset provided by
Dryad Digital Repository
Authors
Martin Marko; DrahomÃr Michalko; Adam Kubinec; Igor RieÄ anskÃ½
Time period covered
Jan 1, 2023
Description
Recent theoretical advances highlighted the need for novel means of assessing semantic cognition. Here, we introduce the Associative-Dissociative Retrieval Task (ADT), positing a novel way to test inhibitory control over semantic memory retrieval by contrasting the efficacy of associative (automatic) and dissociative (controlled) retrieval on standard set of verbal stimuli. All ADT measures achieved excellent reliability, homogeneity, and short-term temporal stability. Moreover, in-depth stimulus level analyses showed that associating is easier for words evoking few but strong associates, yet such propensity hampers the inhibition. Finally, we provided critical support for the construct validity of the ADT measures, demonstrating reliable correlations with domain-specific measures of semantic memory functioning (semantic fluency and associative combination) but negligible correlations with domain-general capacities (processing speed and working memory). Together, we show that ADT provid..., All datasets were collected via behavioural testing in a laboratory using a computer. Data referenced in electronic supplementary material were collected via online forms. Data processing is described in the manuscript and supplementary material, and detailed in the supplied R scripts. Details for each dataset and script are provided in the README file., Supplied data are saved in .csv and .txt format. All data can be accessed via freely available software, including R (for scripts to process and analyze the data) or JASP. In case of downloading the individual data files, we recommend placing them on a C: disk, otherwise adjust the corresponding lines (with paths to files) in respective sections of the R script.Â Individual behavioural tasks used in the current study can also be inspected in a free stand-alone version of PsychoPy software (note that running the tasks used in the current study on PsychoPy versions newer than v3.2.4 may result in errors. Therefore, we recommend running them specifically on version 3.2.4)., # Measuring semantic memory using associative and dissociative retrieval tasks

We provide two sets of data files: 1) Raw data files containing unprocessed data of individual participants on given cognitive tasks; 2) Processed data files directly prepared for statistical analyses conducted in the study. Likewise, we provide all codes used to process and analyze the data.

The data come from behavioural testing conducted on a computer during individual testing sessions in the laboratory.

Description of the data and file structure

All raw data files (except the .zip file "ADT_StimWords") are structured as long-formatted data frames where columns represent individual variables and rows individual responses of each participant. Processed data files are mostly structured in wide data format.

Raw data files:

ADT.raw.txt

NOTE: This file is available only at the alternative repository at the Open Science Framework; url:

Contains unprocessed retrieval latency a...
P
CARLA Dataset
paperswithcode.com
Updated Feb 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexey Dosovitskiy; German Ros; Felipe Codevilla; Antonio Lopez; Vladlen Koltun (2021). CARLA Dataset [Dataset]. https://paperswithcode.com/dataset/carla
Explore at:
Dataset updated
Feb 2, 2021
Authors
Alexey Dosovitskiy; German Ros; Felipe Codevilla; Antonio Lopez; Vladlen Koltun
Description
CARLA (CAR Learning to Act) is an open simulator for urban driving, developed as an open-source layer over Unreal Engine 4. Technically, it operates similarly to, as an open source layer over Unreal Engine 4 that provides sensors in the form of RGB cameras (with customizable positions), ground truth depth maps, ground truth semantic segmentation maps with 12 semantic classes designed for driving (road, lane marking, traffic sign, sidewalk and so on), bounding boxes for dynamic objects in the environment, and measurements of the agent itself (vehicle location and orientation).
P
Data from: FLAIR (French Land cover from Aerospace ImageRy) Dataset
paperswithcode.com
Updated Nov 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anatol Garioud; Stéphane Peillet; Eva Bookjans; Sébastien Giordano; Boris Wattrelos (2022). FLAIR (French Land cover from Aerospace ImageRy) Dataset [Dataset]. https://paperswithcode.com/dataset/flair-french-land-cover-from-aerospace
Explore at:
Dataset updated
Nov 22, 2022
Authors
Anatol Garioud; Stéphane Peillet; Eva Bookjans; Sébastien Giordano; Boris Wattrelos
Area covered
France, French
Description
The French National Institute of Geographical and Forest Information (IGN) has the mission to document and measure land-cover on French territory and provides referential geographical datasets, including high-resolution aerial images and topographic maps. The monitoring of land-cover plays a crucial role in land management and planning initiatives, which can have significant socio-economic and environmental impact. Together with remote sensing technologies, artificial intelligence (IA) promises to become a powerful tool in determining land-cover and its evolution. IGN is currently exploring the potential of IA in the production of high-resolution land cover maps. Notably, deep learning methods are employed to obtain a semantic segmentation of aerial images. However, territories as large as France imply heterogeneous contexts: variations in landscapes and image acquisition make it challenging to provide uniform, reliable and accurate results across all of France.

The FLAIR-one dataset presented is part of the dataset currently used at IGN to establish the French national reference land cover map "Occupation du sol `a grande \'echelle" (OCS- GE). It covers 810 km² and has 13 semantic classes.
f
Data_Sheet_1_Distributional Measures of Semantic Abstraction.zip
frontiersin.figshare.com
zip
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sabine Schulte im Walde; Diego Frassinelli (2023). Data_Sheet_1_Distributional Measures of Semantic Abstraction.zip [Dataset]. http://doi.org/10.3389/frai.2021.796756.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2021.796756.s001
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Sabine Schulte im Walde; Diego Frassinelli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This article provides an in-depth study of distributional measures for distinguishing between degrees of semantic abstraction. Abstraction is considered a “central construct in cognitive science” and a “process of information reduction that allows for efficient storage and retrieval of central knowledge”. Relying on the distributional hypothesis, computational studies have successfully exploited measures of contextual co-occurrence and neighbourhood density to distinguish between conceptual semantic categorisations. So far, these studies have modeled semantic abstraction across lexical-semantic tasks such as ambiguity; diachronic meaning changes; abstractness vs. concreteness; and hypernymy. Yet, the distributional approaches target different conceptual types of semantic relatedness, and as to our knowledge not much attention has been paid to apply, compare or analyse the computational abstraction measures across conceptual tasks. The current article suggests a novel perspective that exploits variants of distributional measures to investigate semantic abstraction in English in terms of the abstract–concrete dichotomy (e.g., glory–banana) and in terms of the generality–specificity distinction (e.g., animal–fish), in order to compare the strengths and weaknesses of the measures regarding categorisations of abstraction, and to determine and investigate conceptual differences. In a series of experiments we identify reliable distributional measures for both instantiations of lexical-semantic abstraction and reach a precision higher than 0.7, but the measures clearly differ for the abstract–concrete vs. abstract–specific distinctions and for nouns vs. verbs. Overall, we identify two groups of measures, (i) frequency and word entropy when distinguishing between more and less abstract words in terms of the generality–specificity distinction, and (ii) neighbourhood density variants (especially target–context diversity) when distinguishing between more and less abstract words in terms of the abstract–concrete dichotomy. We conclude that more general words are used more often and are less surprising than more specific words, and that abstract words establish themselves empirically in semantically more diverse contexts than concrete words. Finally, our experiments once more point out that distributional models of conceptual categorisations need to take word classes and ambiguity into account: results for nouns vs. verbs differ in many respects, and ambiguity hinders fine-tuning empirical observations.
f
Performance comparison of our proposed model on CHASE_DB1 dataset with other...
plos.figshare.com
xls
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohsin Raza; Khuram Naveed; Awais Akram; Nema Salem; Amir Afaq; Hussain Ahmad Madni; Mohammad A. U. Khan; Mui-zzud- din (2023). Performance comparison of our proposed model on CHASE_DB1 dataset with other existing models. [Dataset]. http://doi.org/10.1371/journal.pone.0261698.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0261698.t005
Dataset updated
Jun 7, 2023
Dataset provided by
PLOS ONE
Authors
Mohsin Raza; Khuram Naveed; Awais Akram; Nema Salem; Amir Afaq; Hussain Ahmad Madni; Mohammad A. U. Khan; Mui-zzud- din
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance comparison of our proposed model on CHASE_DB1 dataset with other existing models.
Publishing without Publishers: A Decentralized Server Network for Scientific...
figshare.com
pdf
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tobias Kuhn (2023). Publishing without Publishers: A Decentralized Server Network for Scientific Data [Dataset]. http://doi.org/10.6084/m9.figshare.1287478.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1287478.v1
Dataset updated
Jun 7, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Tobias Kuhn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We propose a server network based on nanopublications and trusty URIs for publishing, retrieving, and reusing semantic data. There exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. To solve this problem, we propose to design scientific data publishing as a Web-based bottom-up process, without top-down control of central authorities such as publishing companies. We present a protocol and a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data with formal semantics. We show how this approach allows researchers to produce, publish, retrieve, address, verify, and recombine datasets and their individual nanopublications. Due to the use of trusty URIs, which include cryptographic hash values of the content they represent, all content in the network is verifiable and immutable. Our evaluation of the current small network shows that this system is efficient and reliable, and we discuss how it could grow to handle the large amounts of structured data that modern science is producing and consuming. We believe that this network can serve as a solid basis for semantic publishing and could contribute to improve the availability and reproducibility of scientific results.
f
Summary of datasets used in the experiments.
figshare.com
xls
Updated Jun 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohsin Raza; Khuram Naveed; Awais Akram; Nema Salem; Amir Afaq; Hussain Ahmad Madni; Mohammad A. U. Khan; Mui-zzud- din (2023). Summary of datasets used in the experiments. [Dataset]. http://doi.org/10.1371/journal.pone.0261698.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0261698.t004
Dataset updated
Jun 8, 2023
Dataset provided by
PLOS ONE
Authors
Mohsin Raza; Khuram Naveed; Awais Akram; Nema Salem; Amir Afaq; Hussain Ahmad Madni; Mohammad A. U. Khan; Mui-zzud- din
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary of datasets used in the experiments.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Electric Wires Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/electric-wires-dataset

Electric Wires Dataset Dataset

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Apr 17, 2025

Description

Description:

👉 Download the dataset here

The Electric Wires Dataset is a top-notch, automatically generate resource design specifically for the semantic segmentation of cable-like objects, with a special emphasis on electric wires. This versatile dataset is built to be domain-independent, making it suitable for a wide range of industrial applications. Whether in construction, industrial manufacturing, power distribution, or communication infrastructure, this dataset is tailor to meet the needs of sectors where accurately recognizing wires and similar objects is crucial.

Dataset Generation Process:

The Electric Wires Dataset is created using a unique procedure that ensures both precision and consistency across all images. The process starts by placing the target object, electric wires, against a monochromatic background. This method allows for easy removal of the background using the chroma-key technique. As a result, clear and accurate training masks are generated for the target object.

Once the masks are generated, they can be combined with various backgrounds to produce a domain-independent dataset. This approach significantly reduces the reality gap, ensuring that the dataset remains applicable across different real-world scenarios. The process also includes extensive augmentation of the foreground images, enhancing the dataset's robustness and adaptability.

Download Dataset

Key Features:

High-Quality Annotations: The dataset provides precise segmentation masks for electric wires, enabling accurate training of semantic segmentation models.

Domain-Independence: By incorporating various backgrounds, the dataset is design to be used across multiple domains without the need for extensive domain-specific adjustments.

Chroma-Key Technique: Utilizes the chroma-key technique to ensure clean and accurate separation of the target objects from the background.

Augmentation: Includes a wide range of augment images, increasing the dataset's diversity and improving model generalization.

Versatile Applications: Ideal for training models used in construction, industrial manufacturing, power distribution, and communication infrastructure, where wire recognition is essential.

Applications:

This dataset is particularly beneficial for developing Al models in the following areas:

Industrial Automation: Improving the accuracy of robotic systems in recognizing and handling wires during assembly and manufacturing processes.

Safety Monitoring: Enhancing surveillance systems to detect and monitor electric wires in various environments, reducing risks associated with electrical hazards.

Infrastructure Maintenance: Assisting in the inspection and maintenance of power distribution networks and communication lines by accurately identifying wires in complex environments.

Augmented Reality: Facilitating the development of AR systems that require precise recognition of wires for overlaying relevant information in industrial settings.

Conclusion:

The Electric Wires Dataset is a highly versatile and essential tool for training semantic segmentation models, particularly those focused on recognizing cable-like objects. With high-quality annotations and extensive validation, this dataset serves as a reliable resource for industries that need precise wire detection and segmentation. Moreover, its adaptability makes it valuable across various applications, ensuring accurate results in different contexts.

This dataset is sourced from Kaggle

Clear search

Close search

Google apps

Main menu

Electric Wires Dataset Dataset

Industrial Machine Tool Element Surface Defect Dataset

SSHOC - National Gallery - Grounds Database CIDOC CRM Mapped Dataset

Semantic Knowledge Representation API

Data from: RELLIS-3D Dataset: Data, Benchmarks and Analysis

CORD-19 Dataset v2020

Open-Ended track where your team can build anything using the dataset provided by us

Cityscapes Dataset

Data from: FCG-MFD: Benchmark Function Call Graph-Based Dataset for Malware...

Data from: What Makes Sentences Semantically Related? A Textual Relatedness...

COVID-19 Open Research Dataset (CORD-19)

RDF Databases Software Market Report | Global Forecast From 2025 To 2033

RDF Databases Software Market Outlook

Component Analysis

DataSheet3_TextNetTopics Pro, a topic model-based text classification for...

Data from: Measuring semantic memory using associative and dissociative...

Description of the data and file structure

Raw data files:

ADT.raw.txt

CARLA Dataset

Data from: FLAIR (French Land cover from Aerospace ImageRy) Dataset

Data_Sheet_1_Distributional Measures of Semantic Abstraction.zip

Performance comparison of our proposed model on CHASE_DB1 dataset with other...

Publishing without Publishers: A Decentralized Server Network for Scientific...

Summary of datasets used in the experiments.

Electric Wires Dataset DatasetSee More Versions

Electric Wires Dataset Dataset