19 datasets found
  1. P

    Electric Wires Dataset Dataset

    • paperswithcode.com
    Updated Apr 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Electric Wires Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/electric-wires-dataset
    Explore at:
    Dataset updated
    Apr 17, 2025
    Description

    Description:

    👉 Download the dataset here

    The Electric Wires Dataset is a top-notch, automatically generate resource design specifically for the semantic segmentation of cable-like objects, with a special emphasis on electric wires. This versatile dataset is built to be domain-independent, making it suitable for a wide range of industrial applications. Whether in construction, industrial manufacturing, power distribution, or communication infrastructure, this dataset is tailor to meet the needs of sectors where accurately recognizing wires and similar objects is crucial.

    Dataset Generation Process:

    The Electric Wires Dataset is created using a unique procedure that ensures both precision and consistency across all images. The process starts by placing the target object, electric wires, against a monochromatic background. This method allows for easy removal of the background using the chroma-key technique. As a result, clear and accurate training masks are generated for the target object.

    Once the masks are generated, they can be combined with various backgrounds to produce a domain-independent dataset. This approach significantly reduces the reality gap, ensuring that the dataset remains applicable across different real-world scenarios. The process also includes extensive augmentation of the foreground images, enhancing the dataset's robustness and adaptability.

    Download Dataset

    Key Features:

    High-Quality Annotations: The dataset provides precise segmentation masks for electric wires, enabling accurate training of semantic segmentation models.

    Domain-Independence: By incorporating various backgrounds, the dataset is design to be used across multiple domains without the need for extensive domain-specific adjustments.

    Chroma-Key Technique: Utilizes the chroma-key technique to ensure clean and accurate separation of the target objects from the background.

    Augmentation: Includes a wide range of augment images, increasing the dataset's diversity and improving model generalization.

    Versatile Applications: Ideal for training models used in construction, industrial manufacturing, power distribution, and communication infrastructure, where wire recognition is essential.

    Applications:

    This dataset is particularly beneficial for developing Al models in the following areas:

    Industrial Automation: Improving the accuracy of robotic systems in recognizing and handling wires during assembly and manufacturing processes.

    Safety Monitoring: Enhancing surveillance systems to detect and monitor electric wires in various environments, reducing risks associated with electrical hazards.

    Infrastructure Maintenance: Assisting in the inspection and maintenance of power distribution networks and communication lines by accurately identifying wires in complex environments.

    Augmented Reality: Facilitating the development of AR systems that require precise recognition of wires for overlaying relevant information in industrial settings.

    Conclusion:

    The Electric Wires Dataset is a highly versatile and essential tool for training semantic segmentation models, particularly those focused on recognizing cable-like objects. With high-quality annotations and extensive validation, this dataset serves as a reliable resource for industries that need precise wire detection and segmentation. Moreover, its adaptability makes it valuable across various applications, ensuring accurate results in different contexts.

    This dataset is sourced from Kaggle

  2. k

    Industrial Machine Tool Element Surface Defect Dataset

    • radar.kit.edu
    • radar-service.eu
    tar
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tobias Schlagenhauf; Magnus Landwehr; Jürgen Fleischer (2023). Industrial Machine Tool Element Surface Defect Dataset [Dataset]. http://doi.org/10.35097/1278
    Explore at:
    tar(121882112 bytes)Available download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Fleischer, Jürgen
    Karlsruhe Institute of Technology
    Schlagenhauf, Tobias
    Landwehr, Magnus
    Authors
    Tobias Schlagenhauf; Magnus Landwehr; Jürgen Fleischer
    Description

    The dataset contains 1104 channel 3 images with 394 image-annotations for the surface damage type “pitting”. The annotations made with the annotation tool labelme, are available in JSON format and hence convertible to VOC and COCO format. All images come from two BSD types. The dataset available for download is divided into two folders, data with all images as JPEG, label with all annotations, and saved_model with a baseline model. The authors also provide a python script to divide the data and labels into three different split types – train_test_split, which splits images into the same train and test data-split the authors used for the baseline model, wear_dev_split, which creates all 27 wear developments and type_split, which splits the data into the occurring BSD-types. One of the two mentioned BSD types is represented with 69 images and 55 different image-sizes. All images with this BSD type come either in a clean or soiled condition. The other BSD type is shown on 325 images with two image-sizes. Since all images of this type have been taken with continuous time the degree of soiling is evolving. Also, the dataset contains as above mentioned 27 pitting development sequences with every 69 images. Instruction dataset split The authors of this dataset provide 3 types of different dataset splits. To get the data split you have to run the python script split_dataset.py. Script inputs: split-type (mandatory) output directory (mandatory) Different split-types: train_test_split: splits dataset into train and test data (80%/20%) wear_dev_split: splits dataset into 27 wear-developments type_split: splits dataset into different BSD types Example: C:\Users\Desktop>python split_dataset.py --split_type=train_test_split --output_dir=BSD_split_folder Result: ./BSD_slit_folder/train/ and ./BSD_slit_folder/test/

  3. Z

    SSHOC - National Gallery - Grounds Database CIDOC CRM Mapped Dataset

    • data.niaid.nih.gov
    • dataverse.nl
    • +1more
    Updated Jul 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Padfield (2024). SSHOC - National Gallery - Grounds Database CIDOC CRM Mapped Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6478779
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Orla Delaney
    Joseph Padfield
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    In 2018 the IPERION-CH Grounds Database was presented to examine how the data produced through the scientific examination of historic painting preparation or grounds samples, from multiple institutions could be combined in a flexible digital form. Exploring the presentation of interrelated high resolution images, text, complex metadata and procedural documentation. The original main user interface is live, though password protected at this time. Work within the SSHOC project aimed to reformat the data to create a more FAIR data-set, so in addition to mapping it to a standard ontology, to increase Interoperability, it has also been made available in the form of open linkable data combined with a SPARQL end-point. A draft version of this live data presentation can been found Here.

    This is a draft data-set and further work is planned to debug and improve its semantic structure.This deposit contains the CIDOC-CRM mapped data formatted in XML and an example model diagram representing some of the key relationships covered in the data-set.

  4. d

    Semantic Knowledge Representation API

    • catalog.data.gov
    • healthdata.gov
    • +2more
    Updated Jun 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). Semantic Knowledge Representation API [Dataset]. https://catalog.data.gov/dataset/semantic-knowledge-representation-api
    Explore at:
    Dataset updated
    Jun 19, 2025
    Dataset provided by
    National Library of Medicine
    Description

    The SKR Project was initiated at NLM in order to develop programs to provide usable semantic representation of biomedical free text by building on resources currently available at the library. The SKR project is concerned with reliable and effective management of the information encoded in natural language texts. The project develops programs that provide usable semantic representation of biomedical text by building on resources currently available at the Library, especially the UMLS knowledge sources and the natural language processing tools provided by the SPECIALIST system. This Java-based API to the Semantic Knowledge Representation (SKR) Scheduler facility was created to provide users with the ability to programmatically submit jobs to the Scheduler Batch and Interactive facilities instead of using the Web-based interface.

  5. a

    Data from: RELLIS-3D Dataset: Data, Benchmarks and Analysis

    • academictorrents.com
    bittorrent
    Updated Dec 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiang, Peng and Osteen, Philip and Wigness, Maggie and Saripalli, Srikanth (2024). RELLIS-3D Dataset: Data, Benchmarks and Analysis [Dataset]. https://academictorrents.com/details/4cfa80e6d91e8c6c79bcc2f405dbd9255b5cf4e8
    Explore at:
    bittorrent(635808536706)Available download formats
    Dataset updated
    Dec 25, 2024
    Dataset authored and provided by
    Jiang, Peng and Osteen, Philip and Wigness, Maggie and Saripalli, Srikanth
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    Semantic scene understanding is crucial for robust and safe autonomous navigation, particularly so in off-road environments. Recent deep learning advances for 3D semantic segmentation rely heavily on large sets of training data, however existing autonomy datasets either represent urban environments or lack multimodal off-road data. We fill this gap with RELLIS-3D, a multimodal dataset collected in an off-road environment, which contains annotations for 13,556 LiDAR scans and 6,235 images. The data was collected on the Rellis Campus of Texas A&M University, and presents challenges to existing algorithms related to class imbalance and environmental topography. Additionally, we evaluate the current state of the art deep learning semantic segmentation models on this dataset. Experimental results show that RELLIS-3D presents challenges for algorithms designed for segmentation in urban environments. This novel dataset provides the resources needed by researchers to continue to develop mo

  6. CORD-19 Dataset v2020

    • kaggle.com
    Updated Oct 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SMLRA-KJSCE (2020). CORD-19 Dataset v2020 [Dataset]. https://www.kaggle.com/datasets/smlrakjsce/cord19-dataset-v2020/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 18, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SMLRA-KJSCE
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Open-Ended track where your team can build anything using the dataset provided by us

    Dataset Description In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 200,000 scholarly articles, including over 100,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.

    Call to Action We are issuing a call to action to the world's artificial intelligence experts to develop text and data mining tools that can help the medical community develop answers to high priority scientific questions. The CORD-19 dataset represents the most extensive machine-readable coronavirus literature collection available for data mining to date. This allows the worldwide AI research community the opportunity to apply text and data mining approaches to find answers to questions within, and connect insights across, this content in support of the ongoing COVID-19 response efforts worldwide. There is a growing urgency for these approaches because of the rapid increase in coronavirus literature, making it difficult for the medical community to keep up.

    Many of the questions are suitable for text mining, and we encourage researchers to develop text mining tools to provide insights on these questions.We are maintaining a summary of the community's contributions.

    Acknowledgements We wouldn't be here without the help of others. The datset is a subset of the dataset available at AI2's Semantic Scholar - https://pages.semanticscholar.org/coronavirus-research This dataset was created by the Allen Institute for AI in partnership with the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, IBM, and the National Library of Medicine - National Institutes of Health, in coordination with The White House Office of Science and Technology Policy. Dataset The dataset is in tar.gz format and can be downloaded from - https://drive.google.com/file/d/15SV8_Nc1HECN9uaplDSQx7H1yKFR4F_Z/view?usp=sharing

    Submissions Notebook and Output results are expected as appropriate submissions.

  7. P

    Cityscapes Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated May 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marius Cordts; Mohamed Omran; Sebastian Ramos; Timo Rehfeld; Markus Enzweiler; Rodrigo Benenson; Uwe Franke; Stefan Roth; Bernt Schiele (2020). Cityscapes Dataset [Dataset]. https://paperswithcode.com/dataset/cityscapes
    Explore at:
    Dataset updated
    May 19, 2020
    Authors
    Marius Cordts; Mohamed Omran; Sebastian Ramos; Timo Rehfeld; Markus Enzweiler; Rodrigo Benenson; Uwe Franke; Stefan Roth; Bernt Schiele
    Description

    Cityscapes is a large-scale database which focuses on semantic understanding of urban street scenes. It provides semantic, instance-wise, and dense pixel annotations for 30 classes grouped into 8 categories (flat surfaces, humans, vehicles, constructions, objects, nature, sky, and void). The dataset consists of around 5000 fine annotated images and 20000 coarse annotated ones. Data was captured in 50 cities during several months, daytimes, and good weather conditions. It was originally recorded as video so the frames were manually selected to have the following features: large number of dynamic objects, varying scene layout, and varying background.

  8. f

    Data from: FCG-MFD: Benchmark Function Call Graph-Based Dataset for Malware...

    • figshare.com
    zip
    Updated Jan 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hassan jalil hadi (2025). FCG-MFD: Benchmark Function Call Graph-Based Dataset for Malware Family Detection [Dataset]. http://doi.org/10.6084/m9.figshare.26886148.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 8, 2025
    Dataset provided by
    figshare
    Authors
    Hassan jalil hadi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cyber crimes related to malware families are on the rise. This growth persists despite the prevalence of various antivirus software and approaches for malware detection and classification. Security experts have implemented Machine Learning (ML) techniques to identify these cyber-crimes. However, these approaches demand updated malware datasets for continuous improvements amid the evolving sophistication of malware strains. Thus, we present the FCG-MFD, a benchmark dataset with extensive Function Call Graphs (FCG) for malware family detection. This dataset guarantees resistance against emerging malware families by enabling security systems. Our dataset has two sub-datasets (FCG & Metadata) (1,00,000 samples) from VirusSamples, Virusshare, VirusSign, theZoo, Vx-underground, and MalwareBazaar curated using FCGs and metadata to optimize the efficacy of ML algorithms. We suggest a new malware analysis technique using FCGs and graph embedding networks, offering a solution to the complexity of feature engineering in ML-based malware analysis. Our approach to extracting semantic features via the Natural Language Processing (NLP) method is inspired by tasks involving sentences and words, respectively, for functions and instructions. We leverage a node2vec mechanism-based graph embedding network to generate malware embedding vectors. These vectors enable automated and efficient malware analysis by combining structural and semantic features. We use two datasets (FCG & Metadata) to assess FCG-MFD performance. F1-Scores of 99.14% and 99.28% are competitive with State-of-the-art (SOTA) methods.

  9. Data from: What Makes Sentences Semantically Related? A Textual Relatedness...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, pdf, zip
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Abdalla; Mohamed Abdalla; Krishnapriya Vishnubhotla; Saif M. Mohammad; Saif M. Mohammad; Krishnapriya Vishnubhotla (2024). What Makes Sentences Semantically Related? A Textual Relatedness Dataset and Empirical Study [Dataset]. http://doi.org/10.5281/zenodo.7599667
    Explore at:
    pdf, bin, zip, csvAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mohamed Abdalla; Mohamed Abdalla; Krishnapriya Vishnubhotla; Saif M. Mohammad; Saif M. Mohammad; Krishnapriya Vishnubhotla
    Description

    What Makes Sentences Semantically Related? A Textual Relatedness Dataset and Empirical Study

    This repository contains data and code for the paper What Makes Sentences Semantically Related: A Textual Relatedness Dataset and Empirical Study.

    We hope that this work will spur further research on understanding sentence--sentence relatedness, methods of sentence representation, measures of semantic relatedness, and their applications.

    Citing our work
    Please use the following BibTex entry to cite us if you use our dataset or any of the associated analyses:

    @inproceedings{abdalla2023makes,
    title={What Makes Sentences Semantically Related: A Textual Relatedness Dataset and Empirical Study},
    author={Abdalla, Mohamed and Vishnubhotla, Krishnapriya and Mohammad, Saif M.},
    year={2023},
    address = {Dubrovnik, Croatia},
    publisher = "Association for Computational Linguistics",
    booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume"
    }

    Dataset Description

    The dataset consists of 5500 English sentence pairs that are scored and ranked on a relatedness scale ranging from 0 (least related) to 1 (most related).

    Why Semantic Relatedness?
    Closeness of meaning can be of two kinds: semantic relatedness and semantic similarity. Two sentences are considered semantically similar when they have a paraphrasal or entailment relation, whereas relatedness accounts for all of the commonalities that can exist between two sentences. Semantic relatedness is central to textual coherence and narrative structure. Automatically determining semantic relatedness has many applications such as question answering, plagiarism detection, text generation (say in personal assistants and chat bots), and summarization.

    Prior NLP work has focused on semantic similarity (a small subset of semantic relatedness), largely because of a dearth of datasets. In this paper, we present the first manually annotated dataset of sentence--sentence semantic relatedness. It includes fine-grained scores of relatedness from 0 (least related) to 1 (most related) for 5,500 English sentence pairs. The sentences are taken from diverse sources and thus also have diverse sentence structures, varying amounts of lexical overlap, and varying formality.

    Comparative Annotations and Best-Worst Scaling
    Most existing sentence-sentence similarity datasets were annotated, one item at a time, using coarse rating labels such as integer values between 1 and 5 @ representing coarse degrees of closeness. It is well documented that such approaches suffer from inter- and intra-annotator inconsistency, scale region bias, and issues arising due to the fixed granularity.

    The relatedness scores for our dataset were, instead, obtained using a comparative annotation schema. In comparative annotations, two (or more) items are presented together and the annotator has to determine which is greater with respect to the metric of interest.

    Specifically, we use Best-Worst Scaling, a comparative annotation method}, which has been shown to produce reliable scores with fewer annotations in other NLP tasks. We use scripts from https://saifmohammad.com/WebPages/BestWorst.html to obtain relatedness scores from our annotations.


    Loading the Dataset
    - The sentence pairs, and associated scores, are in the file sem_text_rel_ranked.csv in the root directory. The CSV file can be read using:

    python
    import pandas as pd
     
    str = pd.read_csv('sem_text_rel_ranked.csv')
     
    row = str.loc[0]
    sent1, sent2 = row['Text'].split("
    ")
    score = row['Score']

    - Relevant columns:

    - Text: Sentence pair, separated by the newline character.
    - Score: The semantic relatedness score between 0 and 1.

    - Additionally:
    - the SourceID column indicates the source dataset from which the sentence pair was drawn (see Table 2 of our paper)
    - The SubsetID column indicates the sampling strategy used for the source dataset
    - and the PairID is a unique identifier for each pair that also indicates its Source and Subset.


    Raw Annotations from Amazon Mechanical Turk

    - The `mturk_data/` subdirectory provides the raw MTurk annotations obtained with our comparative annotation setup.
    - Each row of `mturk_data/bws_annotations.csv` consists of four sentence pairs along with human annotations for the most related (column `BestItem`) and the least related (column `WorstItem`) pair.
    - File `mturk_data/id2sents.csv` pairs each sentence pair with the corresponding SourceID, SubsetID, and PairID that indicates the source dataset (see Table 2 of our paper).
    - See file `mturk_data/task_intructions.txt` for the instructions provided to annotators for our task.


    Datasheet for STR-2022
    The datasheet for our dataset is in the document `STR2022-datastatement.pdf` in the root folder of this repository.

    Ethics Statement
    Any dataset of semantic relatedness entails several ethical considerations. We talk about this in Section 8 of our paper.

    Creators
    - Mohamed Abdalla (University of Toronto)
    - Krishnapriya Vishnubhotla (University of Toronto)
    - Saif M. Mohammad (National Research Council Canada)

    Contact: msa@cs.toronto.edu, vkpriya@cs.toronto.edu, saif.mohammad@nrc-cnrc.gc.ca

  10. COVID-19 Open Research Dataset (CORD-19)

    • zenodo.org
    • live.european-language-grid.eu
    application/gzip, bin +3
    Updated Jul 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Kohlmeier; Kyle Lo; Lucy Lu Wang; JJ Yang; Sebastian Kohlmeier; Kyle Lo; Lucy Lu Wang; JJ Yang (2024). COVID-19 Open Research Dataset (CORD-19) [Dataset]. http://doi.org/10.5281/zenodo.3765923
    Explore at:
    pdf, application/gzip, txt, csv, binAvailable download formats
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sebastian Kohlmeier; Kyle Lo; Lucy Lu Wang; JJ Yang; Sebastian Kohlmeier; Kyle Lo; Lucy Lu Wang; JJ Yang
    Description

    Important: This dataset is updated regularly and the latest version for download can be found here.

    In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of scholarly articles, including full text content, about COVID-19 and the coronavirus family of viruses for use by the global research community.

    This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease. The corpus will be updated weekly as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others.

    By downloading this dataset you are agreeing to the Dataset license. Specific licensing information for individual articles in the dataset is available in the metadata file.

    Additional licensing information is available on the PMC website, medRxiv website and bioRxiv website.

    Dataset content:

    • Commercial use subset
    • Non-commercial use subset
    • PMC custom license subset
    • bioRxiv/medRxiv subset (pre-prints that are not peer reviewed)
    • Metadata file
    • Readme

    Each paper is represented as a single JSON object (see schema file for details).

    Description:

    The dataset contains all COVID-19 and coronavirus-related research (e.g. SARS, MERS, etc.) from the following sources:

    • PubMed's PMC open access corpus using this query (COVID-19 and coronavirus research)
    • Additional COVID-19 research articles from a corpus maintained by the WHO
    • bioRxiv and medRxiv pre-prints using the same query as PMC (COVID-19 and coronavirus research)

    We also provide a comprehensive metadata file of coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic and the WHO COVID-19 database of publications (includes articles without open access full text).

    We recommend using metadata from the comprehensive file when available, instead of parsed metadata in the dataset. Please note the dataset may contain multiple entries for individual PMC IDs in cases when supplementary materials are available.

    This repository is linked to the WHO database of publications on coronavirus disease and other resources, such as Microsoft Academic Graph, PubMed, and Semantic Scholar. A coalition including the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, and the National Library of Medicine of the National Institutes of Health came together to provide this service.

    Citation:

    When including CORD-19 data in a publication or redistribution, please cite our arXiv pre-print.

    The Allen Institute for AI and particularly the Semantic Scholar team will continue to provide updates to this dataset as the situation evolves and new research is released.

  11. RDF Databases Software Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). RDF Databases Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-rdf-databases-software-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Sep 22, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    RDF Databases Software Market Outlook



    The RDF Databases Software market is experiencing a notable growth trajectory, with the global market size poised to grow from USD 1.5 billion in 2023 to an estimated USD 3.2 billion by 2032, reflecting a compound annual growth rate (CAGR) of 8.5%. This significant growth is driven by increasing data volumes, the necessity of efficient data management solutions, and the rising adoption of semantic web technologies. The growing demand for effective data integration and metadata management across various industries is fueling the expansion of the RDF Databases Software market globally.



    One of the primary growth factors for the RDF Databases Software market is the exploding volume of data generated by enterprises and the need for sophisticated data management systems. RDF (Resource Description Framework) databases are pivotal in enabling efficient data integration and facilitating advanced analytics by providing a structured format for data storage. The increasing investments in big data and analytics are propelling the adoption of RDF databases as they offer superior capabilities in handling complex and heterogeneous data sources. Furthermore, the rise in digital transformation initiatives across industries necessitates robust database solutions, thereby driving market growth.



    Another significant growth factor is the widespread adoption of semantic web technologies. RDF databases are integral to the semantic web as they provide a standardized way to describe and interlink data. This capability is crucial for enhancing data interoperability and enabling more intelligent and context-aware applications. Industries such as healthcare, finance, and retail are increasingly leveraging RDF databases to improve data integration, enhance decision-making processes, and deliver personalized customer experiences. The inherent flexibility and scalability of RDF databases make them an attractive choice for organizations aiming to harness the full potential of their data assets.



    The growing emphasis on regulatory compliance and data governance is also contributing to the market expansion. With stringent data protection regulations such as GDPR and CCPA, organizations are compelled to implement robust data management practices. RDF databases, with their ability to maintain comprehensive metadata and provide detailed data lineage, are becoming essential tools for ensuring compliance and enhancing data governance frameworks. This trend is particularly prominent in industries like BFSI and healthcare, where data integrity and security are paramount.



    Regionally, North America holds a significant share of the RDF Databases Software market, driven by the early adoption of advanced technologies and the presence of major market players. The region's well-established IT infrastructure and the high demand for data-driven decision-making solutions are key factors promoting market growth. Other regions, such as Europe and Asia Pacific, are also witnessing substantial growth owing to increasing digitalization efforts and the surging need for efficient data management systems. The Asia Pacific region, in particular, is expected to exhibit the highest CAGR during the forecast period, fueled by rapid technological advancements and the expansion of cloud-based services.



    Component Analysis



    The RDF Databases Software market is segmented into two primary components: Software and Services. The software segment encompasses various RDF database management systems and tools that facilitate efficient data storage, retrieval, and querying. This segment is crucial for organizations aiming to leverage semantic web technologies and improve data interoperability. The software component is witnessing robust growth due to the rising demand for scalable and flexible database solutions that can handle complex and diverse data sets. Additionally, advancements in software capabilities, such as enhanced query performance and improved scalability, are driving the adoption of RDF database software across industries.



    Within the software segment, Open Source and Commercial software categories further delineate the market. Open Source RDF databases are gaining traction due to their cost-effectiveness and the growing preference for community-supported solutions. On the other hand, commercial RDF database software offers advanced features, dedicated support, and enterprise-grade security, making them suitable for large organizations with stringent data management requirements. The continuous development and innovation in RDF database software are expected to drive this segment's

  12. f

    DataSheet3_TextNetTopics Pro, a topic model-based text classification for...

    • frontiersin.figshare.com
    xlsx
    Updated Oct 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Voskergian; Burcu Bakir-Gungor; Malik Yousef (2023). DataSheet3_TextNetTopics Pro, a topic model-based text classification for short text by integration of semantic and document-topic distribution information.xlsx [Dataset]. http://doi.org/10.3389/fgene.2023.1243874.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 5, 2023
    Dataset provided by
    Frontiers
    Authors
    Daniel Voskergian; Burcu Bakir-Gungor; Malik Yousef
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the exponential growth in the daily publication of scientific articles, automatic classification and categorization can assist in assigning articles to a predefined category. Article titles are concise descriptions of the articles’ content with valuable information that can be useful in document classification and categorization. However, shortness, data sparseness, limited word occurrences, and the inadequate contextual information of scientific document titles hinder the direct application of conventional text mining and machine learning algorithms on these short texts, making their classification a challenging task. This study firstly explores the performance of our earlier study, TextNetTopics on the short text. Secondly, here we propose an advanced version called TextNetTopics Pro, which is a novel short-text classification framework that utilizes a promising combination of lexical features organized in topics of words and topic distribution extracted by a topic model to alleviate the data-sparseness problem when classifying short texts. We evaluate our proposed approach using nine state-of-the-art short-text topic models on two publicly available datasets of scientific article titles as short-text documents. The first dataset is related to the Biomedical field, and the other one is related to Computer Science publications. Additionally, we comparatively evaluate the predictive performance of the models generated with and without using the abstracts. Finally, we demonstrate the robustness and effectiveness of the proposed approach in handling the imbalanced data, particularly in the classification of Drug-Induced Liver Injury articles as part of the CAMDA challenge. Taking advantage of the semantic information detected by topic models proved to be a reliable way to improve the overall performance of ML classifiers.

  13. d

    Data from: Measuring semantic memory using associative and dissociative...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jan 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Marko; DrahomÃr Michalko; Adam Kubinec; Igor RieÄ anský (2024). Measuring semantic memory using associative and dissociative retrieval tasks [Dataset]. http://doi.org/10.5061/dryad.vdncjsz1f
    Explore at:
    Dataset updated
    Jan 29, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Martin Marko; Drahomír Michalko; Adam Kubinec; Igor RieÄ anský
    Time period covered
    Jan 1, 2023
    Description

    Recent theoretical advances highlighted the need for novel means of assessing semantic cognition. Here, we introduce the Associative-Dissociative Retrieval Task (ADT), positing a novel way to test inhibitory control over semantic memory retrieval by contrasting the efficacy of associative (automatic) and dissociative (controlled) retrieval on standard set of verbal stimuli. All ADT measures achieved excellent reliability, homogeneity, and short-term temporal stability. Moreover, in-depth stimulus level analyses showed that associating is easier for words evoking few but strong associates, yet such propensity hampers the inhibition. Finally, we provided critical support for the construct validity of the ADT measures, demonstrating reliable correlations with domain-specific measures of semantic memory functioning (semantic fluency and associative combination) but negligible correlations with domain-general capacities (processing speed and working memory). Together, we show that ADT provid..., All datasets were collected via behavioural testing in a laboratory using a computer. Data referenced in electronic supplementary material were collected via online forms. Data processing is described in the manuscript and supplementary material, and detailed in the supplied R scripts. Details for each dataset and script are provided in the README file., Supplied data are saved in .csv and .txt format. All data can be accessed via freely available software, including R (for scripts to process and analyze the data) or JASP. In case of downloading the individual data files, we recommend placing them on a C: disk, otherwise adjust the corresponding lines (with paths to files) in respective sections of the R script. Individual behavioural tasks used in the current study can also be inspected in a free stand-alone version of PsychoPy software (note that running the tasks used in the current study on PsychoPy versions newer than v3.2.4 may result in errors. Therefore, we recommend running them specifically on version 3.2.4)., # Measuring semantic memory using associative and dissociative retrieval tasks

    We provide two sets of data files: 1) Raw data files containing unprocessed data of individual participants on given cognitive tasks; 2) Processed data files directly prepared for statistical analyses conducted in the study. Likewise, we provide all codes used to process and analyze the data.

    The data come from behavioural testing conducted on a computer during individual testing sessions in the laboratory.

    Description of the data and file structure

    All raw data files (except the .zip file "ADT_StimWords") are structured as long-formatted data frames where columns represent individual variables and rows individual responses of each participant. Processed data files are mostly structured in wide data format.

    Raw data files:
    ADT.raw.txt
    • NOTE: This file is available only at the alternative repository at the Open Science Framework; url:
    • Contains unprocessed retrieval latency a...
  14. P

    CARLA Dataset

    • paperswithcode.com
    Updated Feb 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexey Dosovitskiy; German Ros; Felipe Codevilla; Antonio Lopez; Vladlen Koltun (2021). CARLA Dataset [Dataset]. https://paperswithcode.com/dataset/carla
    Explore at:
    Dataset updated
    Feb 2, 2021
    Authors
    Alexey Dosovitskiy; German Ros; Felipe Codevilla; Antonio Lopez; Vladlen Koltun
    Description

    CARLA (CAR Learning to Act) is an open simulator for urban driving, developed as an open-source layer over Unreal Engine 4. Technically, it operates similarly to, as an open source layer over Unreal Engine 4 that provides sensors in the form of RGB cameras (with customizable positions), ground truth depth maps, ground truth semantic segmentation maps with 12 semantic classes designed for driving (road, lane marking, traffic sign, sidewalk and so on), bounding boxes for dynamic objects in the environment, and measurements of the agent itself (vehicle location and orientation).

  15. P

    Data from: FLAIR (French Land cover from Aerospace ImageRy) Dataset

    • paperswithcode.com
    Updated Nov 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anatol Garioud; Stéphane Peillet; Eva Bookjans; Sébastien Giordano; Boris Wattrelos (2022). FLAIR (French Land cover from Aerospace ImageRy) Dataset [Dataset]. https://paperswithcode.com/dataset/flair-french-land-cover-from-aerospace
    Explore at:
    Dataset updated
    Nov 22, 2022
    Authors
    Anatol Garioud; Stéphane Peillet; Eva Bookjans; Sébastien Giordano; Boris Wattrelos
    Area covered
    France, French
    Description

    The French National Institute of Geographical and Forest Information (IGN) has the mission to document and measure land-cover on French territory and provides referential geographical datasets, including high-resolution aerial images and topographic maps. The monitoring of land-cover plays a crucial role in land management and planning initiatives, which can have significant socio-economic and environmental impact. Together with remote sensing technologies, artificial intelligence (IA) promises to become a powerful tool in determining land-cover and its evolution. IGN is currently exploring the potential of IA in the production of high-resolution land cover maps. Notably, deep learning methods are employed to obtain a semantic segmentation of aerial images. However, territories as large as France imply heterogeneous contexts: variations in landscapes and image acquisition make it challenging to provide uniform, reliable and accurate results across all of France.

    The FLAIR-one dataset presented is part of the dataset currently used at IGN to establish the French national reference land cover map "Occupation du sol `a grande \'echelle" (OCS- GE). It covers 810 km² and has 13 semantic classes.

  16. f

    Data_Sheet_1_Distributional Measures of Semantic Abstraction.zip

    • frontiersin.figshare.com
    zip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sabine Schulte im Walde; Diego Frassinelli (2023). Data_Sheet_1_Distributional Measures of Semantic Abstraction.zip [Dataset]. http://doi.org/10.3389/frai.2021.796756.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Sabine Schulte im Walde; Diego Frassinelli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This article provides an in-depth study of distributional measures for distinguishing between degrees of semantic abstraction. Abstraction is considered a “central construct in cognitive science” and a “process of information reduction that allows for efficient storage and retrieval of central knowledge”. Relying on the distributional hypothesis, computational studies have successfully exploited measures of contextual co-occurrence and neighbourhood density to distinguish between conceptual semantic categorisations. So far, these studies have modeled semantic abstraction across lexical-semantic tasks such as ambiguity; diachronic meaning changes; abstractness vs. concreteness; and hypernymy. Yet, the distributional approaches target different conceptual types of semantic relatedness, and as to our knowledge not much attention has been paid to apply, compare or analyse the computational abstraction measures across conceptual tasks. The current article suggests a novel perspective that exploits variants of distributional measures to investigate semantic abstraction in English in terms of the abstract–concrete dichotomy (e.g., glory–banana) and in terms of the generality–specificity distinction (e.g., animal–fish), in order to compare the strengths and weaknesses of the measures regarding categorisations of abstraction, and to determine and investigate conceptual differences. In a series of experiments we identify reliable distributional measures for both instantiations of lexical-semantic abstraction and reach a precision higher than 0.7, but the measures clearly differ for the abstract–concrete vs. abstract–specific distinctions and for nouns vs. verbs. Overall, we identify two groups of measures, (i) frequency and word entropy when distinguishing between more and less abstract words in terms of the generality–specificity distinction, and (ii) neighbourhood density variants (especially target–context diversity) when distinguishing between more and less abstract words in terms of the abstract–concrete dichotomy. We conclude that more general words are used more often and are less surprising than more specific words, and that abstract words establish themselves empirically in semantically more diverse contexts than concrete words. Finally, our experiments once more point out that distributional models of conceptual categorisations need to take word classes and ambiguity into account: results for nouns vs. verbs differ in many respects, and ambiguity hinders fine-tuning empirical observations.

  17. f

    Performance comparison of our proposed model on CHASE_DB1 dataset with other...

    • plos.figshare.com
    xls
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohsin Raza; Khuram Naveed; Awais Akram; Nema Salem; Amir Afaq; Hussain Ahmad Madni; Mohammad A. U. Khan; Mui-zzud- din (2023). Performance comparison of our proposed model on CHASE_DB1 dataset with other existing models. [Dataset]. http://doi.org/10.1371/journal.pone.0261698.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Mohsin Raza; Khuram Naveed; Awais Akram; Nema Salem; Amir Afaq; Hussain Ahmad Madni; Mohammad A. U. Khan; Mui-zzud- din
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance comparison of our proposed model on CHASE_DB1 dataset with other existing models.

  18. Publishing without Publishers: A Decentralized Server Network for Scientific...

    • figshare.com
    pdf
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tobias Kuhn (2023). Publishing without Publishers: A Decentralized Server Network for Scientific Data [Dataset]. http://doi.org/10.6084/m9.figshare.1287478.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Tobias Kuhn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We propose a server network based on nanopublications and trusty URIs for publishing, retrieving, and reusing semantic data. There exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. To solve this problem, we propose to design scientific data publishing as a Web-based bottom-up process, without top-down control of central authorities such as publishing companies. We present a protocol and a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data with formal semantics. We show how this approach allows researchers to produce, publish, retrieve, address, verify, and recombine datasets and their individual nanopublications. Due to the use of trusty URIs, which include cryptographic hash values of the content they represent, all content in the network is verifiable and immutable. Our evaluation of the current small network shows that this system is efficient and reliable, and we discuss how it could grow to handle the large amounts of structured data that modern science is producing and consuming. We believe that this network can serve as a solid basis for semantic publishing and could contribute to improve the availability and reproducibility of scientific results.

  19. f

    Summary of datasets used in the experiments.

    • figshare.com
    xls
    Updated Jun 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohsin Raza; Khuram Naveed; Awais Akram; Nema Salem; Amir Afaq; Hussain Ahmad Madni; Mohammad A. U. Khan; Mui-zzud- din (2023). Summary of datasets used in the experiments. [Dataset]. http://doi.org/10.1371/journal.pone.0261698.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Mohsin Raza; Khuram Naveed; Awais Akram; Nema Salem; Amir Afaq; Hussain Ahmad Madni; Mohammad A. U. Khan; Mui-zzud- din
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary of datasets used in the experiments.

  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2025). Electric Wires Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/electric-wires-dataset

Electric Wires Dataset Dataset

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Apr 17, 2025
Description

Description:

👉 Download the dataset here

The Electric Wires Dataset is a top-notch, automatically generate resource design specifically for the semantic segmentation of cable-like objects, with a special emphasis on electric wires. This versatile dataset is built to be domain-independent, making it suitable for a wide range of industrial applications. Whether in construction, industrial manufacturing, power distribution, or communication infrastructure, this dataset is tailor to meet the needs of sectors where accurately recognizing wires and similar objects is crucial.

Dataset Generation Process:

The Electric Wires Dataset is created using a unique procedure that ensures both precision and consistency across all images. The process starts by placing the target object, electric wires, against a monochromatic background. This method allows for easy removal of the background using the chroma-key technique. As a result, clear and accurate training masks are generated for the target object.

Once the masks are generated, they can be combined with various backgrounds to produce a domain-independent dataset. This approach significantly reduces the reality gap, ensuring that the dataset remains applicable across different real-world scenarios. The process also includes extensive augmentation of the foreground images, enhancing the dataset's robustness and adaptability.

Download Dataset

Key Features:

High-Quality Annotations: The dataset provides precise segmentation masks for electric wires, enabling accurate training of semantic segmentation models.

Domain-Independence: By incorporating various backgrounds, the dataset is design to be used across multiple domains without the need for extensive domain-specific adjustments.

Chroma-Key Technique: Utilizes the chroma-key technique to ensure clean and accurate separation of the target objects from the background.

Augmentation: Includes a wide range of augment images, increasing the dataset's diversity and improving model generalization.

Versatile Applications: Ideal for training models used in construction, industrial manufacturing, power distribution, and communication infrastructure, where wire recognition is essential.

Applications:

This dataset is particularly beneficial for developing Al models in the following areas:

Industrial Automation: Improving the accuracy of robotic systems in recognizing and handling wires during assembly and manufacturing processes.

Safety Monitoring: Enhancing surveillance systems to detect and monitor electric wires in various environments, reducing risks associated with electrical hazards.

Infrastructure Maintenance: Assisting in the inspection and maintenance of power distribution networks and communication lines by accurately identifying wires in complex environments.

Augmented Reality: Facilitating the development of AR systems that require precise recognition of wires for overlaying relevant information in industrial settings.

Conclusion:

The Electric Wires Dataset is a highly versatile and essential tool for training semantic segmentation models, particularly those focused on recognizing cable-like objects. With high-quality annotations and extensive validation, this dataset serves as a reliable resource for industries that need precise wire detection and segmentation. Moreover, its adaptability makes it valuable across various applications, ensuring accurate results in different contexts.

This dataset is sourced from Kaggle

Search
Clear search
Close search
Google apps
Main menu