24 datasets found
  1. h

    v-lol-trains

    • huggingface.co
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Intelligence & Machine Learning Lab at TU Darmstadt (2023). v-lol-trains [Dataset]. https://huggingface.co/datasets/AIML-TUDA/v-lol-trains
    Explore at:
    Dataset updated
    Jul 13, 2023
    Dataset authored and provided by
    Artificial Intelligence & Machine Learning Lab at TU Darmstadt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Dataset Name

      Dataset Summary
    

    This diagnostic dataset (website, paper) is specifically designed to evaluate the visual logical learning capabilities of machine learning models. It offers a seamless integration of visual and logical challenges, providing 2D images of complex visual trains, where the classification is derived from rule-based logic. The fundamental idea of V-LoL remains to integrate the explicit logical learning tasks of classic symbolic AI… See the full description on the dataset page: https://huggingface.co/datasets/AIML-TUDA/v-lol-trains.

  2. Z

    OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated...

    • data.niaid.nih.gov
    Updated May 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Tan (2022). OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6399454
    Explore at:
    Dataset updated
    May 13, 2022
    Dataset provided by
    Benjamin Tan
    Siddharth Garg
    Ramesh Karri
    Animesh Basak Chowdhury
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Logic synthesis is a challenging and widely-researched combinatorial optimization problem during integrated circuit (IC) design. It transforms a high-level description of hardware in a programming language like Verilog into an optimized digital circuit netlist, a network of interconnected Boolean logic gates, that implements the function. Spurred by the success of ML in solving combinatorial and graph problems in other domains, there is growing interest in the design of ML-guided logic synthesis tools. Yet, there are no standard datasets or prototypical learning tasks defined for this problem domain. Here, we describe OpenABC-D,a large-scale, labeled dataset produced by synthesizing open source designs with a leading open-source logic synthesis tool and illustrate its use in developing, evaluating and benchmarking ML-guided logic synthesis. OpenABC-D has intermediate and final outputs in the form of 870,000 And-Inverter-Graphs (AIGs) produced from 1500 synthesis runs plus labels such as the optimized node counts, and de-lay. We define a generic learning problem on this dataset and benchmark existing solutions for it. The codes related to dataset creation and benchmark models are available athttps://github.com/NYU-MLDA/OpenABC.git.

  3. f

    Detailed results of four ML models on the overall test dataset.

    • plos.figshare.com
    xls
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd (2024). Detailed results of four ML models on the overall test dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0305035.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 27, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Detailed results of four ML models on the overall test dataset.

  4. f

    primitives.pl from Hypothesizing an algorithm from one example: the role of...

    • rs.figshare.com
    txt
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S. H. Muggleton FREng (2023). primitives.pl from Hypothesizing an algorithm from one example: the role of specificity [Dataset]. http://doi.org/10.6084/m9.figshare.22661481.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    The Royal Society
    Authors
    S. H. Muggleton FREng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Statistical machine learning usually achieves high-accuracy models by employing tens of thousands of examples. By contrast, both children and adult humans typically learn new concepts from either one or a small number of instances. The high data efficiency of human learning is not easily explained in terms of standard formal frameworks for machine learning, including Gold’s learning-in-the-limit framework and Valiant’s probably approximately correct (PAC) model. This paper explores ways in which this apparent disparity between human and machine learning can be reconciled by considering algorithms involving a preference for specificity combined with program minimality. It is shown how this can be efficiently enacted using hierarchical search based on identification of certificates and push-down automata to support hypothesizing compactly expressed maximal efficiency algorithms. Early results of a new system called DeepLog indicate that such approaches can support efficient top-down construction of relatively complex logic programs from a single example.This article is part of a discussion meeting issue ‘Cognitive artificial intelligence’.

  5. d

    MLRegTest: A benchmark for the machine learning of regular languages

    • search.dataone.org
    • datadryad.org
    Updated Jul 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sam van der Poel; Dakotah Lambert; Kalina Kostyszyn; Tiantian Gao; Rahul Verma; Derek Andersen; Joanne Chau; Emily Peterson; Cody St. Clair; Paul Fodor; Chihiro Shibata; Jeffrey Heinz (2024). MLRegTest: A benchmark for the machine learning of regular languages [Dataset]. http://doi.org/10.5061/dryad.dncjsxm4h
    Explore at:
    Dataset updated
    Jul 14, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Sam van der Poel; Dakotah Lambert; Kalina Kostyszyn; Tiantian Gao; Rahul Verma; Derek Andersen; Joanne Chau; Emily Peterson; Cody St. Clair; Paul Fodor; Chihiro Shibata; Jeffrey Heinz
    Time period covered
    Jan 1, 2023
    Description

    MLRegTest is a benchmark for machine learning systems on sequence classification, which contains training, development, and test sets from 1,800 regular languages. MLRegTest organizes its languages according to their logical complexity (monadic second order, first order, propositional, or monomial expressions) and the kind of logical literals (string, tier-string, subsequence, or combinations thereof). The logical complexity and choice of literal provides a systematic way to understand different kinds of long-distance dependencies in regular languages, and therefore to understand the capacities of different ML systems to learn such long-distance dependencies., The languages were generated by creating finite-state acceptors and the datasets were generated by sampling from these finite-state acceptors. The scripts and software used for these processes are open source and available. For details, see https://github.com/heinz-jeffrey/subregular-learning. Details are described in the arxiv preprint "MLRegTest: A Benchmark for the Machine Learning of Regular Languages"., , # MLRegTest: A benchmark for the machine learning of regular languages

    https://doi.org/10.5061/dryad.dncjsxm4h

    MLRegTest provides training and testing data for 1800 regular languages.

    This repository contains three gzipped tar archives.

    > data.tar.gz (21GB) > languages.tar.gz (4.5MB) > models.tar.gz (76GB)

    When uncompressed, these yield three directories, described in detail below.

    > data (43GB) > languages (38MB) > models (87GB)

    Languages

    Languages are named according to the scheme Sigma.Tau.class.k.t.i.plebby, where Sigma is a two-digit alphabet size, Tau a two-digit number of salient symbols (the 'tier'), class the named subregular class, k the width of factors used (if applicable), t the threshold counted to (if applicable), and i a unique identifier. The table below unabbreviates the class names, and shows how many languages of each class there are.

    | class | name ...

  6. m

    Driving Behavior Dataset

    • data.mendeley.com
    Updated Dec 9, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asim Sinan Yuksel (2021). Driving Behavior Dataset [Dataset]. http://doi.org/10.17632/jj3tw8kj6h.3
    Explore at:
    Dataset updated
    Dec 9, 2021
    Authors
    Asim Sinan Yuksel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset for modeling risky driver behaviors based on accelerometer (X,Y,Z axis in meters per second squared (m/s2)) and gyroscope (X,Y, Z axis in degrees per second (°/s) ) data. Sampling Rate: Average 2 samples (rows) per second Cars: Ford Fiesta 1.4, Ford Fiesta 1.25, Hyundai i20 Drivers: 3 different drivers with the ages of 27, 28 and 37 Driver Behaviors: Sudden Acceleration (Class Label: 1), Sudden Right Turn (Class Label: 2), Sudden Left Turn (Class Label: 3), Sudden Break (Class Label: 4) Best Window Size: 14 seconds Sensor: MPU6050 Device: Raspberry Pi 3 Model B Please See Summary Table for summary of the collected data.

  7. f

    Data_Sheet_1_There Is Hope After All: Quantifying Opinion and...

    • frontiersin.figshare.com
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingxi Cheng; Shahin Nazarian; Paul Bogdan (2023). Data_Sheet_1_There Is Hope After All: Quantifying Opinion and Trustworthiness in Neural Networks.PDF [Dataset]. http://doi.org/10.3389/frai.2020.00054.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers
    Authors
    Mingxi Cheng; Shahin Nazarian; Paul Bogdan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Artificial Intelligence (AI) plays a fundamental role in the modern world, especially when used as an autonomous decision maker. One common concern nowadays is “how trustworthy the AIs are.” Human operators follow a strict educational curriculum and performance assessment that could be exploited to quantify how much we entrust them. To quantify the trust of AI decision makers, we must go beyond task accuracy especially when facing limited, incomplete, misleading, controversial or noisy datasets. Toward addressing these challenges, we describe DeepTrust, a Subjective Logic (SL) inspired framework that constructs a probabilistic logic description of an AI algorithm and takes into account the trustworthiness of both dataset and inner algorithmic workings. DeepTrust identifies proper multi-layered neural network (NN) topologies that have high projected trust probabilities, even when trained with untrusted data. We show that uncertain opinion of data is not always malicious while evaluating NN's opinion and trustworthiness, whereas the disbelief opinion hurts trust the most. Also trust probability does not necessarily correlate with accuracy. DeepTrust also provides a projected trust probability of NN's prediction, which is useful when the NN generates an over-confident output under problematic datasets. These findings open new analytical avenues for designing and improving the NN topology by optimizing opinion and trustworthiness, along with accuracy, in a multi-objective optimization formulation, subject to space and time constraints.

  8. Z

    Data from: Simple Dataset for Proof Method Recommendation in Isabelle/HOL

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nagashima, Yutaka (2020). Simple Dataset for Proof Method Recommendation in Isabelle/HOL [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3819025
    Explore at:
    Dataset updated
    May 13, 2020
    Dataset authored and provided by
    Nagashima, Yutaka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recently, a growing number of researchers have applied machine learning to assist users of interactive theorem provers.

    However, the expressive nature of underlying logics and esoteric structures of proof documents impede machine learning practitioners, who often do not have much expertise in formal logic, let alone Isabelle/HOL, from applying their tools and expertise to theorem proving.

    In this data description, we present a simple dataset that contains data on over 400k proof method applications in the Archive of Formal Proofs along with over 100 extracted features for each in a format that can be processed easily without any knowledge about formal logic.

    Our simple data format allows machine learning practitioners to try machine learning tools to predict proof methods in Isabelle/HOL, even if they are unfamiliar with theorem proving.

  9. f

    Dataset distribution between train and test sets.

    • plos.figshare.com
    xls
    Updated Aug 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd (2024). Dataset distribution between train and test sets. [Dataset]. http://doi.org/10.1371/journal.pone.0305035.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 27, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Among many types of cancers, to date, lung cancer remains one of the deadliest cancers around the world. Many researchers, scientists, doctors, and people from other fields continuously contribute to this subject regarding early prediction and diagnosis. One of the significant problems in prediction is the black-box nature of machine learning models. Though the detection rate is comparatively satisfactory, people have yet to learn how a model came to that decision, causing trust issues among patients and healthcare workers. This work uses multiple machine learning models on a numerical dataset of lung cancer-relevant parameters and compares performance and accuracy. After comparison, each model has been explained using different methods. The main contribution of this research is to give logical explanations of why the model reached a particular decision to achieve trust. This research has also been compared with a previous study that worked with a similar dataset and took expert opinions regarding their proposed model. We also showed that our research achieved better results than their proposed model and specialist opinion using hyperparameter tuning, having an improved accuracy of almost 100% in all four models.

  10. Data from: Dataset of Paper "Enhancing photovoltaic cell classification...

    • zenodo.org
    • portalcienciaytecnologia.jcyl.es
    csv, zip
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Héctor Felipe Mateo Romero; Héctor Felipe Mateo Romero; Mario Eduardo Carbonó dela Rosa; Mario Eduardo Carbonó dela Rosa; Luis Hernández Callejo; Luis Hernández Callejo; Miguel Gonzalez; Miguel Gonzalez; Valentín Cardeñoso-Payo; Valentín Cardeñoso-Payo; Víctor Alonso Gómez; Víctor Alonso Gómez; Sara Gallardo-Saavedra; Sara Gallardo-Saavedra; José Ignacio Morales; José Ignacio Morales (2025). Dataset of Paper "Enhancing photovoltaic cell classification through mamdani fuzzy logic: a comparative study with machine learning approaches employing electroluminescence images" [Dataset]. http://doi.org/10.5281/zenodo.14979964
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Mar 6, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Héctor Felipe Mateo Romero; Héctor Felipe Mateo Romero; Mario Eduardo Carbonó dela Rosa; Mario Eduardo Carbonó dela Rosa; Luis Hernández Callejo; Luis Hernández Callejo; Miguel Gonzalez; Miguel Gonzalez; Valentín Cardeñoso-Payo; Valentín Cardeñoso-Payo; Víctor Alonso Gómez; Víctor Alonso Gómez; Sara Gallardo-Saavedra; Sara Gallardo-Saavedra; José Ignacio Morales; José Ignacio Morales
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset of

    H.F. Mateo-Romero, M.E. Carbonó de la Rosa, L. Hernández-Callejo, M.A. González-Rebollo, V. Cardeñoso-Payo, V. Alonso-Gómez, S. Gallardo-Saavedra, J.I. Morales Aragonés, “Enhancing photovoltaic cell classification through mamdani fuzzy logic: a comparative study with machine learning approaches employing electroluminescence images”, Progress in Artificial Intelligence (2024) pp. 1-11.
    https://doi.org/10.1007/s13748-024-00353-w

  11. The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial...

    • zenodo.org
    • autovi.utc.fr
    • +1more
    bin, txt, zip
    Updated Jun 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philippe Carvalho; Philippe Carvalho; Meriem Lafou; Alexandre Durupt; Alexandre Durupt; Antoine Leblanc; Yves Grandvalet; Yves Grandvalet; Meriem Lafou; Antoine Leblanc (2024). The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial Production Dataset for Unsupervised Anomaly Detection [Dataset]. http://doi.org/10.5281/zenodo.10459003
    Explore at:
    zip, txt, binAvailable download formats
    Dataset updated
    Jun 5, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Philippe Carvalho; Philippe Carvalho; Meriem Lafou; Alexandre Durupt; Alexandre Durupt; Antoine Leblanc; Yves Grandvalet; Yves Grandvalet; Meriem Lafou; Antoine Leblanc
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    See the official website: https://autovi.utc.fr

    Modern industrial production lines must be set up with robust defect inspection modules that are able to withstand high product variability. This means that in a context of industrial production, new defects that are not yet known may appear, and must therefore be identified.

    On industrial production lines, the typology of potential defects is vast (texture, part failure, logical defects, etc.). Inspection systems must therefore be able to detect non-listed defects, i.e. not-yet-observed defects upon the development of the inspection system. To solve this problem, research and development of unsupervised AI algorithms on real-world data is required.

    Renault Group and the Université de technologie de Compiègne (Roberval and Heudiasyc Laboratories) have jointly developed the Automotive Visual Inspection Dataset (AutoVI), the purpose of which is to be used as a scientific benchmark to compare and develop advanced unsupervised anomaly detection algorithms under real production conditions. The images were acquired on Renault Group's automotive production lines, in a genuine industrial production line environment, with variations in brightness and lighting on constantly moving components. This dataset is representative of actual data acquisition conditions on automotive production lines.

    The dataset contains 3950 images, split into 1530 training images and 2420 testing images.

    The evaluation code can be found at https://github.com/phcarval/autovi_evaluation_code.

    Disclaimer
    All defects shown were intentionally created on Renault Group's production lines for the purpose of producing this dataset. The images were examined and labeled by Renault Group experts, and all defects were corrected after shooting.

    License
    Copyright © 2023-2024 Renault Group

    This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of the license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/.

    For using the data in a way that falls under the commercial use clause of the license, please contact us.

    Attribution
    Please use the following for citing the dataset in scientific work:

    Carvalho, P., Lafou, M., Durupt, A., Leblanc, A., & Grandvalet, Y. (2024). The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial Production Dataset for Unsupervised Anomaly Detection [Dataset]. https://doi.org/10.5281/zenodo.10459003

    Contact
    If you have any questions or remarks about this dataset, please contact us at philippe.carvalho@utc.fr, meriem.lafou@renault.com, alexandre.durupt@utc.fr, antoine.leblanc@renault.com, yves.grandvalet@utc.fr.

    Changelog

    • v1.0.0
      • Cropped engine_wiring, pipe_clip and pipe_staple images
      • Reduced tank_screw, underbody_pipes and underbody_screw image sizes
    • v0.1.1
      • Added ground truth segmentation maps
      • Fixed categorization of some images
      • Added new defect categories
      • Removed tube_fastening and kitting_cart
      • Removed duplicates in pipe_clip
  12. Data from: TextBite: A Historical Czech Document Dataset for Logical Page...

    • zenodo.org
    zip
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Kostelník; Martin Kostelník; Michal Hradiš; Michal Hradiš; Karel Beneš; Karel Beneš (2025). TextBite: A Historical Czech Document Dataset for Logical Page Segmentation [Dataset]. http://doi.org/10.5281/zenodo.15057331
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Martin Kostelník; Martin Kostelník; Michal Hradiš; Michal Hradiš; Karel Beneš; Karel Beneš
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TextBite is a dataset of historical Czech documents spanning the 18th to 20th centuries, featuring diverse layouts from newspapers, dictionaries, and handwritten records. It is mainly aimed at logical segmentation, but can be used for other tasks as well. Additionally, part of the dataset contains handwritten documents, primarily records from schools and public organizations, introducing extra segmentation challenges due to their more loosely structured layouts.

    In total, the dataset contains 8,449 annotated pages, from which 7,346 pages are printed and 1,103 are handwritten. The pages contain a total of 78,863 segments. The test subset contains 964 pages, of which 185 are handwritten. The annotations are provided in an extended COCO format. Each segment is represented by a set of axis aligned bounding boxes, which are connected by directed relationships, representing reading order. To include these relationships in the COCO format, a new top-level key relations is added. Each relation entry specifies a source and a target bounding box.

    In addition to the layout annotations, we provide a textual representation of the pages produced by Optical Character Recognition (OCR) tool PERO-OCR. These come in the form of XML files in the PAGE-XML format, which includes an enclosing polygon for each individual textline along with the transcriptions and their confidences. Lastly, we provide the OCR results in the ALTO format, which includes polygons for individual words in the page image.

  13. R

    Mksc Dataset

    • universe.roboflow.com
    zip
    Updated Jun 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mmax (2023). Mksc Dataset [Dataset]. https://universe.roboflow.com/mmax/mksc/model/9
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 22, 2023
    Dataset authored and provided by
    mmax
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Pixels Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Game Development and Enhancement: Developers can incorporate the MKSC model into their game development process for identifying different game elements like characters or objects (coins, trees, peaches, etc.). This can facilitate automatic level design, character recognition and movement logic.

    2. Interactive Content Creation: Streamers, digital content creators, or video game reviewers can use this model to analyze gameplay, identifying key characters and events in real-time or during video editing. This can open doors to more interactive and engaging content for audiences, possibly even automated highlights or recaps based on character occurrences.

    3. Gaming Tutorials and Guides: The MKSC model can be used to develop comprehensive gaming guides and step-by-step tutorials. By recognizing game elements, it can show players where to find specific items or characters, or provide an analysis of gameplay to help players improve.

    4. Machine Learning Research: Researchers can use the MKSC model as a baseline or reference for their research in video game AI or broader computer vision/ML studies. It provides a good use-case for pixel class recognition in complex, dynamic environments like video games.

    5. Video Game AI Training: AI bots can be trained using the MKSC model. It can help build a neural network that understands video game landscapes, enabling the bots to interact more diversely and intelligently in a video game setup, and enhancing player vs. AI experiences.

  14. The UNSW-NB15 dataset with binarized features

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Feb 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yaman Umuroglu; Yaman Umuroglu (2021). The UNSW-NB15 dataset with binarized features [Dataset]. http://doi.org/10.5281/zenodo.4519767
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 9, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yaman Umuroglu; Yaman Umuroglu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Binarized version of the UNSW-NB15 dataset, where the original features (a mix of strings, categorical values, floating point values etc) are converted to a bit string of 593 bits. Each value in each feature is either 0 or 1, stored as a uint8 value. The uint8 values are represented as numpy arrays, provided separately for training and test data (same train/test split as the original dataset is used). The final binary value in each sample is the expected output.

    Among others, this dataset has been used for quantized neural network research:

    Umuroglu, Y., Akhauri, Y., Fraser, N. J., & Blott, M. (2020, August). LogicNets: Co-Designed Neural Networks and Circuits for Extreme-Throughput Applications. In 2020 30th International Conference on Field-Programmable Logic and Applications (FPL) (pp. 291-297). IEEE.

    The method for binarization is identical to the one described in 10.5281/zenodo.3258657 :

    "T. Murovič, A. Trost, Massively Parallel Combinational Binary Neural Networks for Edge Processing, Elektrotehniški vestnik, vol. 86, no. 1-2, pp. 47-53, 2019"

    The original UNSW-NB15 dataaset is by:

    Moustafa, Nour, and Jill Slay. "UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)." Military Communications and Information Systems Conference (MilCIS), 2015. IEEE, 2015.

  15. h

    Sodoku_Puzzle_Generator

    • huggingface.co
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martial Terran (2025). Sodoku_Puzzle_Generator [Dataset]. https://huggingface.co/datasets/MartialTerran/Sodoku_Puzzle_Generator
    Explore at:
    Dataset updated
    Jul 17, 2025
    Authors
    Martial Terran
    Description

    Developing an MLP-Based AI/ML Model for Sudoku Puzzle Solving

    Introduction to AI/ML Sudoku Solvers

    Sudoku, a widely recognized logic-based combinatorial number-placement puzzle, presents a compelling challenge for Artificial Intelligence and Machine Learning models. The objective of Sudoku is to populate a 9x9 grid, which is further subdivided into nine 3x3 subgrids, with digits ranging from 1 to 9. The fundamental constraint is that each digit must appear exactly once within each row, each… See the full description on the dataset page: https://huggingface.co/datasets/MartialTerran/Sodoku_Puzzle_Generator.

  16. e

    Replication Data for: Unravelling the Dark Web: explainable inference of the...

    • b2find.eudat.eu
    Updated Apr 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Replication Data for: Unravelling the Dark Web: explainable inference of the diversity of microbial interactions - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/6297d15f-6ab1-5d68-b24e-3cb44b7d3f9b
    Explore at:
    Dataset updated
    Apr 3, 2024
    Description

    The functional diversity of microbial communities emerges from a combination of the great number of species and the many interaction types, such as competition, mutualism, predation or parasitism, in microbial ecological networks. Understanding the relationship between microbial networks and the services and functions delivered by the microbial communities is a key challenge for Microbial Ecology, particularly as so many of these interactions are difficult to observe and characterize. We believe that this 'Dark Web' of interactions could be unravelled using an explainable machine learning approach, called Abductive/Inductive Logic Programming (A/ILP) in the R package InfIntE, which uses mechanistic rules (interaction hypotheses) to infer directly the network structure and interaction types. Here we attempt to unravel the dark web of the plant microbiome embodied in metabarcoding data sampled from the grapevine foliar microbiome. Using synthetic, simulated data, we first show that it is possible to satisfactorily reconstruct microbial networks using explainable machine learning. Then we confirm that the dark web of the grapevine microbiome is diverse, being composed of a range of interaction types consistent with the literature. This first attempt to use explainable machine learning to infer microbial interaction networks advances our understanding of the ecological processes that occur in microbial communities and allows us to infer specific types of interaction within the grapevine microbiome that could be validated through experimentation. This work will have potentially valuable applications, such as the discovery of antagonistic interactions that might be used to identify potential biological control agents within the microbiome.

  17. Z

    Data from: Accelerometer-Based Multivariate Time-Series Dataset for Calf...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dissanayake, Oshana (2024). Accelerometer-Based Multivariate Time-Series Dataset for Calf Behavior Classification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13259481
    Explore at:
    Dataset updated
    Aug 13, 2024
    Dataset provided by
    Cunningham, Padraig
    Riaboff, Lucile
    McPherson, Sarah E.
    Kennedy, Emer
    Dissanayake, Oshana
    Allyndrée, Joseph
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AcTBeCalf Dataset Description

    The AcTBeCalf dataset is a comprehensive dataset designed to support the classification of pre-weaned calf behaviors from accelerometer data. It contains detailed accelerometer readings aligned with annotated behaviors, providing a valuable resource for research in multivariate time-series classification and animal behavior analysis. The dataset includes accelerometer data collected from 30 pre-weaned Holstein Friesian and Jersey calves, housed in group pens at the Teagasc Moorepark Research Farm, Ireland. Each calf was equipped with a 3D accelerometer sensor (AX3, Axivity Ltd, Newcastle, UK) sampling at 25 Hz and attached to a neck collar from one week of birth over 13 weeks.

    This dataset encompasses 27.4 hours of accelerometer data aligned with calf behaviors, including both prominent behaviors like lying, standing, and running, as well as less frequent behaviors such as grooming, social interaction, and abnormal behaviors.

    The dataset consists of a single CSV file with the following columns:

    dateTime: Timestamp of the accelerometer reading, sampled at 25 Hz.

    calfid: Identification number of the calf (1-30).

    accX: Accelerometer reading for the X axis (top-bottom direction)*.

    accY: Accelerometer reading for the Y axis (backward-forward direction)*.

    accZ: Accelerometer reading for the Z axis (left-right direction)*.

    behavior: Annotated behavior based on an ethogram of 23 behaviors.

    segId: Segment identification number associated with each accelerometer reading/row, representing all readings of the same behavior segment.

    • the directions are mentioned in relation to the position of the accelerometer sensor on the calf.

    Code Files Description

    The dataset is accompanied by several code files to facilitate the preprocessing and analysis of the accelerometer data and to support the development and evaluation of machine learning models. The main code files included in the dataset repository are:

    accelerometer_time_correction.ipynb: This script corrects the accelerometer time drift, ensuring the alignment of the accelerometer data with the reference time.

    shake_pattern_detector.py: This script includes an algorithm to detect shake patterns in the accelerometer signal for aligning the accelerometer time series with reference times.

    aligning_accelerometer_data_with_annotations.ipynb: This notebook aligns the accelerometer time series with the annotated behaviors based on timestamps.

    manual_inspection_ts_validation.ipynb: This notebook provides a manual inspection process for ensuring the accurate alignment of the accelerometer data with the annotated behaviors.

    additional_ts_generation.ipynb: This notebook generates additional time-series data from the original X, Y, and Z accelerometer readings, including Magnitude, ODBA (Overall Dynamic Body Acceleration), VeDBA (Vectorial Dynamic Body Acceleration), pitch, and roll.

    genSplit.py: This script provides the logic used for the generalized subject separation for machine learning model training, validation and testing.

    active_inactive_classification.ipynb: This notebook details the process of classifying behaviors into active and inactive categories using a RandomForest model, achieving a balanced accuracy of 92%.

    four_behv_classification.ipynb: This notebook employs the mini-ROCKET feature derivation mechanism and a RidgeClassifierCV to classify behaviors into four categories: drinking milk, lying, running, and other, achieving a balanced accuracy of 84%.

    Kindly cite one of the following papers when using this data:

    Dissanayake, O., McPherson, S. E., Allyndrée, J., Kennedy, E., Cunningham, P., & Riaboff, L. (2024). Evaluating ROCKET and Catch22 features for calf behaviour classification from accelerometer data using Machine Learning models. arXiv preprint arXiv:2404.18159.

    Dissanayake, O., McPherson, S. E., Allyndrée, J., Kennedy, E., Cunningham, P., & Riaboff, L. (2024). Development of a digital tool for monitoring the behaviour of pre-weaned calves using accelerometer neck-collars. arXiv preprint arXiv:2406.17352

  18. f

    Comparison of results due to parameter tuning for five k-fold...

    • plos.figshare.com
    xls
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd (2024). Comparison of results due to parameter tuning for five k-fold cross-validation. [Dataset]. http://doi.org/10.1371/journal.pone.0305035.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 27, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of results due to parameter tuning for five k-fold cross-validation.

  19. f

    Table 1_A hybrid fuzzy logic–Random Forest model to predict psychiatric...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandre Hudon (2025). Table 1_A hybrid fuzzy logic–Random Forest model to predict psychiatric treatment order outcomes: an interpretable tool for legal decision support.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1606250.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Frontiers
    Authors
    Alexandre Hudon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundDecisions surrounding involuntary psychiatric treatment orders often involve complex clinical, legal, and ethical considerations, especially when patients lack decisional capacity and refuse treatment. In Quebec, these orders are issued by the Superior Court based on a combination of medical, legal, and behavioral evidence. However, no transparent, evidence-informed predictive tools currently exist to estimate the likelihood of full treatment order acceptance. This study aims to develop and evaluate a hybrid fuzzy logic–machine learning model to predict such outcomes and identify important influencing factors.MethodsA retrospective dataset of 176 Superior Court judgments rendered in Quebec in 2024 was curated from SOQUIJ, encompassing demographic, clinical, and legal variables. A Mamdani-type fuzzy inference system was constructed to simulate expert decision logic and output a continuous likelihood score. This score, along with structured features, was used to train a Random Forest classifier. Model performance was evaluated using accuracy, precision, recall and F1 score. A 10-fold stratified cross-validation was employed for internal validation. Feature importance was also computed to assess the influence of each variable on the prediction outcome.ResultsThe hybrid model achieved an accuracy of 98.1%, precision of 93.3%, recall of 100%, and a F1 score of 96.6. The most influential predictors were the duration of time granted by the court, duration requested by the clinical team, and age of the defendant. Fuzzy logic features such as severity, compliance, and a composite Burden_Score also significantly contributed to prediction accuracy. Only one misclassified case was observed in the test set, and the system provided interpretable decision logic consistent with expert reasoning.ConclusionThis exploratory study offers a novel approach for decision support in forensic psychiatric contexts. Future work should aim to validate the model across other jurisdictions, incorporate more advanced natural language processing for semantic feature extraction, and explore dynamic rule optimization techniques. These enhancements would further improve generalizability, fairness, and practical utility in real-world clinical and legal settings.

  20. f

    The logical rules learned by the first three decision trees to classify a...

    • plos.figshare.com
    xls
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonas Bischofberger; Arnold Baca; Erich Schikuta (2024). The logical rules learned by the first three decision trees to classify a play as a shot, for each data set. [Dataset]. http://doi.org/10.1371/journal.pone.0298107.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Jonas Bischofberger; Arnold Baca; Erich Schikuta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dstart,goal: Distance from play origin to goal. Dstart,goal: Distance from play end position to goal-line. Aopen: Opening angle of the goal from play origin. Yend*: End position of the play, projected onto goal-line.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Artificial Intelligence & Machine Learning Lab at TU Darmstadt (2023). v-lol-trains [Dataset]. https://huggingface.co/datasets/AIML-TUDA/v-lol-trains

v-lol-trains

AIML-TUDA/v-lol-trains

V-LoL: A Diagnostic Dataset for Visual Logical Learning

Explore at:
Dataset updated
Jul 13, 2023
Dataset authored and provided by
Artificial Intelligence & Machine Learning Lab at TU Darmstadt
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset Card for Dataset Name

  Dataset Summary

This diagnostic dataset (website, paper) is specifically designed to evaluate the visual logical learning capabilities of machine learning models. It offers a seamless integration of visual and logical challenges, providing 2D images of complex visual trains, where the classification is derived from rule-based logic. The fundamental idea of V-LoL remains to integrate the explicit logical learning tasks of classic symbolic AI… See the full description on the dataset page: https://huggingface.co/datasets/AIML-TUDA/v-lol-trains.

Search
Clear search
Close search
Google apps
Main menu