24 datasets found

h
v-lol-trains
huggingface.co
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artificial Intelligence & Machine Learning Lab at TU Darmstadt (2023). v-lol-trains [Dataset]. https://huggingface.co/datasets/AIML-TUDA/v-lol-trains
Explore at:
Dataset updated
Jul 13, 2023
Dataset authored and provided by
Artificial Intelligence & Machine Learning Lab at TU Darmstadt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Dataset Name

Dataset Summary

This diagnostic dataset (website, paper) is specifically designed to evaluate the visual logical learning capabilities of machine learning models. It offers a seamless integration of visual and logical challenges, providing 2D images of complex visual trains, where the classification is derived from rule-based logic. The fundamental idea of V-LoL remains to integrate the explicit logical learning tasks of classic symbolic AI… See the full description on the dataset page: https://huggingface.co/datasets/AIML-TUDA/v-lol-trains.
Z
OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated...
data.niaid.nih.gov
Updated May 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Tan (2022). OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6399454
Explore at:
Dataset updated
May 13, 2022
Dataset provided by
Benjamin Tan
Siddharth Garg
Ramesh Karri
Animesh Basak Chowdhury
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Logic synthesis is a challenging and widely-researched combinatorial optimization problem during integrated circuit (IC) design. It transforms a high-level description of hardware in a programming language like Verilog into an optimized digital circuit netlist, a network of interconnected Boolean logic gates, that implements the function. Spurred by the success of ML in solving combinatorial and graph problems in other domains, there is growing interest in the design of ML-guided logic synthesis tools. Yet, there are no standard datasets or prototypical learning tasks defined for this problem domain. Here, we describe OpenABC-D,a large-scale, labeled dataset produced by synthesizing open source designs with a leading open-source logic synthesis tool and illustrate its use in developing, evaluating and benchmarking ML-guided logic synthesis. OpenABC-D has intermediate and final outputs in the form of 870,000 And-Inverter-Graphs (AIGs) produced from 1500 synthesis runs plus labels such as the optimized node counts, and de-lay. We define a generic learning problem on this dataset and benchmark existing solutions for it. The codes related to dataset creation and benchmark models are available athttps://github.com/NYU-MLDA/OpenABC.git.
f
Detailed results of four ML models on the overall test dataset.
plos.figshare.com
xls
Updated Aug 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd (2024). Detailed results of four ML models on the overall test dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0305035.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0305035.t004
Dataset updated
Aug 27, 2024
Dataset provided by
PLOS ONE
Authors
Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Detailed results of four ML models on the overall test dataset.
f
primitives.pl from Hypothesizing an algorithm from one example: the role of...
rs.figshare.com
txt
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S. H. Muggleton FREng (2023). primitives.pl from Hypothesizing an algorithm from one example: the role of specificity [Dataset]. http://doi.org/10.6084/m9.figshare.22661481.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22661481.v1
Dataset updated
Jun 5, 2023
Dataset provided by
The Royal Society
Authors
S. H. Muggleton FREng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistical machine learning usually achieves high-accuracy models by employing tens of thousands of examples. By contrast, both children and adult humans typically learn new concepts from either one or a small number of instances. The high data efficiency of human learning is not easily explained in terms of standard formal frameworks for machine learning, including Gold’s learning-in-the-limit framework and Valiant’s probably approximately correct (PAC) model. This paper explores ways in which this apparent disparity between human and machine learning can be reconciled by considering algorithms involving a preference for specificity combined with program minimality. It is shown how this can be efficiently enacted using hierarchical search based on identification of certificates and push-down automata to support hypothesizing compactly expressed maximal efficiency algorithms. Early results of a new system called DeepLog indicate that such approaches can support efficient top-down construction of relatively complex logic programs from a single example.This article is part of a discussion meeting issue ‘Cognitive artificial intelligence’.
d
MLRegTest: A benchmark for the machine learning of regular languages
search.dataone.org
datadryad.org
Updated Jul 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sam van der Poel; Dakotah Lambert; Kalina Kostyszyn; Tiantian Gao; Rahul Verma; Derek Andersen; Joanne Chau; Emily Peterson; Cody St. Clair; Paul Fodor; Chihiro Shibata; Jeffrey Heinz (2024). MLRegTest: A benchmark for the machine learning of regular languages [Dataset]. http://doi.org/10.5061/dryad.dncjsxm4h
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.dncjsxm4h
Dataset updated
Jul 14, 2024
Dataset provided by
Dryad Digital Repository
Authors
Sam van der Poel; Dakotah Lambert; Kalina Kostyszyn; Tiantian Gao; Rahul Verma; Derek Andersen; Joanne Chau; Emily Peterson; Cody St. Clair; Paul Fodor; Chihiro Shibata; Jeffrey Heinz
Time period covered
Jan 1, 2023
Description
MLRegTest is a benchmark for machine learning systems on sequence classification, which contains training, development, and test sets from 1,800 regular languages. MLRegTest organizes its languages according to their logical complexity (monadic second order, first order, propositional, or monomial expressions) and the kind of logical literals (string, tier-string, subsequence, or combinations thereof). The logical complexity and choice of literal provides a systematic way to understand different kinds of long-distance dependencies in regular languages, and therefore to understand the capacities of different ML systems to learn such long-distance dependencies., The languages were generated by creating finite-state acceptors and the datasets were generated by sampling from these finite-state acceptors. The scripts and software used for these processes are open source and available. For details, see https://github.com/heinz-jeffrey/subregular-learning. Details are described in the arxiv preprint "MLRegTest: A Benchmark for the Machine Learning of Regular Languages"., , # MLRegTest: A benchmark for the machine learning of regular languages

https://doi.org/10.5061/dryad.dncjsxm4h

MLRegTest provides training and testing data for 1800 regular languages.

This repository contains three gzipped tar archives.

> data.tar.gz (21GB) > languages.tar.gz (4.5MB) > models.tar.gz (76GB)

When uncompressed, these yield three directories, described in detail below.

> data (43GB) > languages (38MB) > models (87GB)

Languages

Languages are named according to the scheme Sigma.Tau.class.k.t.i.plebby, where Sigma is a two-digit alphabet size, Tau a two-digit number of salient symbols (the 'tier'), class the named subregular class, k the width of factors used (if applicable), t the threshold counted to (if applicable), and i a unique identifier. The table below unabbreviates the class names, and shows how many languages of each class there are.

| class | name ...
m
Driving Behavior Dataset
data.mendeley.com
Updated Dec 9, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asim Sinan Yuksel (2021). Driving Behavior Dataset [Dataset]. http://doi.org/10.17632/jj3tw8kj6h.3
Explore at:
Unique identifier
https://doi.org/10.17632/jj3tw8kj6h.3
Dataset updated
Dec 9, 2021
Authors
Asim Sinan Yuksel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset for modeling risky driver behaviors based on accelerometer (X,Y,Z axis in meters per second squared (m/s2)) and gyroscope (X,Y, Z axis in degrees per second (°/s) ) data. Sampling Rate: Average 2 samples (rows) per second Cars: Ford Fiesta 1.4, Ford Fiesta 1.25, Hyundai i20 Drivers: 3 different drivers with the ages of 27, 28 and 37 Driver Behaviors: Sudden Acceleration (Class Label: 1), Sudden Right Turn (Class Label: 2), Sudden Left Turn (Class Label: 3), Sudden Break (Class Label: 4) Best Window Size: 14 seconds Sensor: MPU6050 Device: Raspberry Pi 3 Model B Please See Summary Table for summary of the collected data.
f
Data_Sheet_1_There Is Hope After All: Quantifying Opinion and...
frontiersin.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mingxi Cheng; Shahin Nazarian; Paul Bogdan (2023). Data_Sheet_1_There Is Hope After All: Quantifying Opinion and Trustworthiness in Neural Networks.PDF [Dataset]. http://doi.org/10.3389/frai.2020.00054.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2020.00054.s001
Dataset updated
May 30, 2023
Dataset provided by
Frontiers
Authors
Mingxi Cheng; Shahin Nazarian; Paul Bogdan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Artificial Intelligence (AI) plays a fundamental role in the modern world, especially when used as an autonomous decision maker. One common concern nowadays is “how trustworthy the AIs are.” Human operators follow a strict educational curriculum and performance assessment that could be exploited to quantify how much we entrust them. To quantify the trust of AI decision makers, we must go beyond task accuracy especially when facing limited, incomplete, misleading, controversial or noisy datasets. Toward addressing these challenges, we describe DeepTrust, a Subjective Logic (SL) inspired framework that constructs a probabilistic logic description of an AI algorithm and takes into account the trustworthiness of both dataset and inner algorithmic workings. DeepTrust identifies proper multi-layered neural network (NN) topologies that have high projected trust probabilities, even when trained with untrusted data. We show that uncertain opinion of data is not always malicious while evaluating NN's opinion and trustworthiness, whereas the disbelief opinion hurts trust the most. Also trust probability does not necessarily correlate with accuracy. DeepTrust also provides a projected trust probability of NN's prediction, which is useful when the NN generates an over-confident output under problematic datasets. These findings open new analytical avenues for designing and improving the NN topology by optimizing opinion and trustworthiness, along with accuracy, in a multi-objective optimization formulation, subject to space and time constraints.
Z
Data from: Simple Dataset for Proof Method Recommendation in Isabelle/HOL
data.niaid.nih.gov
zenodo.org
Updated May 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nagashima, Yutaka (2020). Simple Dataset for Proof Method Recommendation in Isabelle/HOL [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3819025
Explore at:
Dataset updated
May 13, 2020
Dataset authored and provided by
Nagashima, Yutaka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Recently, a growing number of researchers have applied machine learning to assist users of interactive theorem provers.

However, the expressive nature of underlying logics and esoteric structures of proof documents impede machine learning practitioners, who often do not have much expertise in formal logic, let alone Isabelle/HOL, from applying their tools and expertise to theorem proving.

In this data description, we present a simple dataset that contains data on over 400k proof method applications in the Archive of Formal Proofs along with over 100 extracted features for each in a format that can be processed easily without any knowledge about formal logic.

Our simple data format allows machine learning practitioners to try machine learning tools to predict proof methods in Isabelle/HOL, even if they are unfamiliar with theorem proving.
f
Dataset distribution between train and test sets.
plos.figshare.com
xls
Updated Aug 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd (2024). Dataset distribution between train and test sets. [Dataset]. http://doi.org/10.1371/journal.pone.0305035.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0305035.t002
Dataset updated
Aug 27, 2024
Dataset provided by
PLOS ONE
Authors
Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Among many types of cancers, to date, lung cancer remains one of the deadliest cancers around the world. Many researchers, scientists, doctors, and people from other fields continuously contribute to this subject regarding early prediction and diagnosis. One of the significant problems in prediction is the black-box nature of machine learning models. Though the detection rate is comparatively satisfactory, people have yet to learn how a model came to that decision, causing trust issues among patients and healthcare workers. This work uses multiple machine learning models on a numerical dataset of lung cancer-relevant parameters and compares performance and accuracy. After comparison, each model has been explained using different methods. The main contribution of this research is to give logical explanations of why the model reached a particular decision to achieve trust. This research has also been compared with a previous study that worked with a similar dataset and took expert opinions regarding their proposed model. We also showed that our research achieved better results than their proposed model and specialist opinion using hyperparameter tuning, having an improved accuracy of almost 100% in all four models.
Data from: Dataset of Paper "Enhancing photovoltaic cell classification...
zenodo.org
portalcienciaytecnologia.jcyl.es
csv, zip
Updated Mar 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Héctor Felipe Mateo Romero; Héctor Felipe Mateo Romero; Mario Eduardo Carbonó dela Rosa; Mario Eduardo Carbonó dela Rosa; Luis Hernández Callejo; Luis Hernández Callejo; Miguel Gonzalez; Miguel Gonzalez; Valentín Cardeñoso-Payo; Valentín Cardeñoso-Payo; Víctor Alonso Gómez; Víctor Alonso Gómez; Sara Gallardo-Saavedra; Sara Gallardo-Saavedra; José Ignacio Morales; José Ignacio Morales (2025). Dataset of Paper "Enhancing photovoltaic cell classification through mamdani fuzzy logic: a comparative study with machine learning approaches employing electroluminescence images" [Dataset]. http://doi.org/10.5281/zenodo.14979964
Explore at:
zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14979964
Dataset updated
Mar 6, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Héctor Felipe Mateo Romero; Héctor Felipe Mateo Romero; Mario Eduardo Carbonó dela Rosa; Mario Eduardo Carbonó dela Rosa; Luis Hernández Callejo; Luis Hernández Callejo; Miguel Gonzalez; Miguel Gonzalez; Valentín Cardeñoso-Payo; Valentín Cardeñoso-Payo; Víctor Alonso Gómez; Víctor Alonso Gómez; Sara Gallardo-Saavedra; Sara Gallardo-Saavedra; José Ignacio Morales; José Ignacio Morales
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset of

H.F. Mateo-Romero, M.E. Carbonó de la Rosa, L. Hernández-Callejo, M.A. González-Rebollo, V. Cardeñoso-Payo, V. Alonso-Gómez, S. Gallardo-Saavedra, J.I. Morales Aragonés, “Enhancing photovoltaic cell classification through mamdani fuzzy logic: a comparative study with machine learning approaches employing electroluminescence images”, Progress in Artificial Intelligence (2024) pp. 1-11.
https://doi.org/10.1007/s13748-024-00353-w
The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial...
zenodo.org
autovi.utc.fr
+1more
bin, txt, zip
Updated Jun 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philippe Carvalho; Philippe Carvalho; Meriem Lafou; Alexandre Durupt; Alexandre Durupt; Antoine Leblanc; Yves Grandvalet; Yves Grandvalet; Meriem Lafou; Antoine Leblanc (2024). The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial Production Dataset for Unsupervised Anomaly Detection [Dataset]. http://doi.org/10.5281/zenodo.10459003
Explore at:
zip, txt, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10459003
Dataset updated
Jun 5, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Philippe Carvalho; Philippe Carvalho; Meriem Lafou; Alexandre Durupt; Alexandre Durupt; Antoine Leblanc; Yves Grandvalet; Yves Grandvalet; Meriem Lafou; Antoine Leblanc
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
See the official website: https://autovi.utc.fr

Modern industrial production lines must be set up with robust defect inspection modules that are able to withstand high product variability. This means that in a context of industrial production, new defects that are not yet known may appear, and must therefore be identified.

On industrial production lines, the typology of potential defects is vast (texture, part failure, logical defects, etc.). Inspection systems must therefore be able to detect non-listed defects, i.e. not-yet-observed defects upon the development of the inspection system. To solve this problem, research and development of unsupervised AI algorithms on real-world data is required.

Renault Group and the Université de technologie de Compiègne (Roberval and Heudiasyc Laboratories) have jointly developed the Automotive Visual Inspection Dataset (AutoVI), the purpose of which is to be used as a scientific benchmark to compare and develop advanced unsupervised anomaly detection algorithms under real production conditions. The images were acquired on Renault Group's automotive production lines, in a genuine industrial production line environment, with variations in brightness and lighting on constantly moving components. This dataset is representative of actual data acquisition conditions on automotive production lines.

The dataset contains 3950 images, split into 1530 training images and 2420 testing images.

The evaluation code can be found at https://github.com/phcarval/autovi_evaluation_code.

Disclaimer
All defects shown were intentionally created on Renault Group's production lines for the purpose of producing this dataset. The images were examined and labeled by Renault Group experts, and all defects were corrected after shooting.

License
Copyright © 2023-2024 Renault Group

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of the license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/.

For using the data in a way that falls under the commercial use clause of the license, please contact us.

Attribution
Please use the following for citing the dataset in scientific work:

Carvalho, P., Lafou, M., Durupt, A., Leblanc, A., & Grandvalet, Y. (2024). The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial Production Dataset for Unsupervised Anomaly Detection [Dataset]. https://doi.org/10.5281/zenodo.10459003

Contact
If you have any questions or remarks about this dataset, please contact us at philippe.carvalho@utc.fr, meriem.lafou@renault.com, alexandre.durupt@utc.fr, antoine.leblanc@renault.com, yves.grandvalet@utc.fr.

Changelog

v1.0.0

Cropped engine_wiring, pipe_clip and pipe_staple images

Reduced tank_screw, underbody_pipes and underbody_screw image sizes

v0.1.1

Added ground truth segmentation maps

Fixed categorization of some images

Added new defect categories

Removed tube_fastening and kitting_cart

Removed duplicates in pipe_clip
Data from: TextBite: A Historical Czech Document Dataset for Logical Page...
zenodo.org
zip
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Kostelník; Martin Kostelník; Michal Hradiš; Michal Hradiš; Karel Beneš; Karel Beneš (2025). TextBite: A Historical Czech Document Dataset for Logical Page Segmentation [Dataset]. http://doi.org/10.5281/zenodo.15057331
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15057331
Dataset updated
Mar 31, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Martin Kostelník; Martin Kostelník; Michal Hradiš; Michal Hradiš; Karel Beneš; Karel Beneš
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TextBite is a dataset of historical Czech documents spanning the 18th to 20th centuries, featuring diverse layouts from newspapers, dictionaries, and handwritten records. It is mainly aimed at logical segmentation, but can be used for other tasks as well. Additionally, part of the dataset contains handwritten documents, primarily records from schools and public organizations, introducing extra segmentation challenges due to their more loosely structured layouts.

In total, the dataset contains 8,449 annotated pages, from which 7,346 pages are printed and 1,103 are handwritten. The pages contain a total of 78,863 segments. The test subset contains 964 pages, of which 185 are handwritten. The annotations are provided in an extended COCO format. Each segment is represented by a set of axis aligned bounding boxes, which are connected by directed relationships, representing reading order. To include these relationships in the COCO format, a new top-level key relations is added. Each relation entry specifies a source and a target bounding box.

In addition to the layout annotations, we provide a textual representation of the pages produced by Optical Character Recognition (OCR) tool PERO-OCR. These come in the form of XML files in the PAGE-XML format, which includes an enclosing polygon for each individual textline along with the transcriptions and their confidences. Lastly, we provide the OCR results in the ALTO format, which includes polygons for individual words in the page image.
R
Mksc Dataset
universe.roboflow.com
zip
Updated Jun 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mmax (2023). Mksc Dataset [Dataset]. https://universe.roboflow.com/mmax/mksc/model/9
Explore at:
zipAvailable download formats
Dataset updated
Jun 22, 2023
Dataset authored and provided by
mmax
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Pixels Bounding Boxes
Description
Here are a few use cases for this project:

Game Development and Enhancement: Developers can incorporate the MKSC model into their game development process for identifying different game elements like characters or objects (coins, trees, peaches, etc.). This can facilitate automatic level design, character recognition and movement logic.

Interactive Content Creation: Streamers, digital content creators, or video game reviewers can use this model to analyze gameplay, identifying key characters and events in real-time or during video editing. This can open doors to more interactive and engaging content for audiences, possibly even automated highlights or recaps based on character occurrences.

Gaming Tutorials and Guides: The MKSC model can be used to develop comprehensive gaming guides and step-by-step tutorials. By recognizing game elements, it can show players where to find specific items or characters, or provide an analysis of gameplay to help players improve.

Machine Learning Research: Researchers can use the MKSC model as a baseline or reference for their research in video game AI or broader computer vision/ML studies. It provides a good use-case for pixel class recognition in complex, dynamic environments like video games.

Video Game AI Training: AI bots can be trained using the MKSC model. It can help build a neural network that understands video game landscapes, enabling the bots to interact more diversely and intelligently in a video game setup, and enhancing player vs. AI experiences.
The UNSW-NB15 dataset with binarized features
zenodo.org
data.niaid.nih.gov
bin
Updated Feb 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yaman Umuroglu; Yaman Umuroglu (2021). The UNSW-NB15 dataset with binarized features [Dataset]. http://doi.org/10.5281/zenodo.4519767
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4519767
Dataset updated
Feb 9, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yaman Umuroglu; Yaman Umuroglu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Binarized version of the UNSW-NB15 dataset, where the original features (a mix of strings, categorical values, floating point values etc) are converted to a bit string of 593 bits. Each value in each feature is either 0 or 1, stored as a uint8 value. The uint8 values are represented as numpy arrays, provided separately for training and test data (same train/test split as the original dataset is used). The final binary value in each sample is the expected output.

Among others, this dataset has been used for quantized neural network research:

Umuroglu, Y., Akhauri, Y., Fraser, N. J., & Blott, M. (2020, August). LogicNets: Co-Designed Neural Networks and Circuits for Extreme-Throughput Applications. In 2020 30th International Conference on Field-Programmable Logic and Applications (FPL) (pp. 291-297). IEEE.

The method for binarization is identical to the one described in 10.5281/zenodo.3258657 :

"T. Murovič, A. Trost, Massively Parallel Combinational Binary Neural Networks for Edge Processing, Elektrotehniški vestnik, vol. 86, no. 1-2, pp. 47-53, 2019"

The original UNSW-NB15 dataaset is by:

Moustafa, Nour, and Jill Slay. "UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)." Military Communications and Information Systems Conference (MilCIS), 2015. IEEE, 2015.
h
Sodoku_Puzzle_Generator
huggingface.co
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martial Terran (2025). Sodoku_Puzzle_Generator [Dataset]. https://huggingface.co/datasets/MartialTerran/Sodoku_Puzzle_Generator
Explore at:
Dataset updated
Jul 17, 2025
Authors
Martial Terran
Description
Developing an MLP-Based AI/ML Model for Sudoku Puzzle Solving

Introduction to AI/ML Sudoku Solvers

Sudoku, a widely recognized logic-based combinatorial number-placement puzzle, presents a compelling challenge for Artificial Intelligence and Machine Learning models. The objective of Sudoku is to populate a 9x9 grid, which is further subdivided into nine 3x3 subgrids, with digits ranging from 1 to 9. The fundamental constraint is that each digit must appear exactly once within each row, each… See the full description on the dataset page: https://huggingface.co/datasets/MartialTerran/Sodoku_Puzzle_Generator.
e
Replication Data for: Unravelling the Dark Web: explainable inference of the...
b2find.eudat.eu
Updated Apr 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Replication Data for: Unravelling the Dark Web: explainable inference of the diversity of microbial interactions - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/6297d15f-6ab1-5d68-b24e-3cb44b7d3f9b
Explore at:
Dataset updated
Apr 3, 2024
Description
The functional diversity of microbial communities emerges from a combination of the great number of species and the many interaction types, such as competition, mutualism, predation or parasitism, in microbial ecological networks. Understanding the relationship between microbial networks and the services and functions delivered by the microbial communities is a key challenge for Microbial Ecology, particularly as so many of these interactions are difficult to observe and characterize. We believe that this 'Dark Web' of interactions could be unravelled using an explainable machine learning approach, called Abductive/Inductive Logic Programming (A/ILP) in the R package InfIntE, which uses mechanistic rules (interaction hypotheses) to infer directly the network structure and interaction types. Here we attempt to unravel the dark web of the plant microbiome embodied in metabarcoding data sampled from the grapevine foliar microbiome. Using synthetic, simulated data, we first show that it is possible to satisfactorily reconstruct microbial networks using explainable machine learning. Then we confirm that the dark web of the grapevine microbiome is diverse, being composed of a range of interaction types consistent with the literature. This first attempt to use explainable machine learning to infer microbial interaction networks advances our understanding of the ecological processes that occur in microbial communities and allows us to infer specific types of interaction within the grapevine microbiome that could be validated through experimentation. This work will have potentially valuable applications, such as the discovery of antagonistic interactions that might be used to identify potential biological control agents within the microbiome.
Z
Data from: Accelerometer-Based Multivariate Time-Series Dataset for Calf...
data.niaid.nih.gov
zenodo.org
Updated Aug 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dissanayake, Oshana (2024). Accelerometer-Based Multivariate Time-Series Dataset for Calf Behavior Classification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13259481
Explore at:
Dataset updated
Aug 13, 2024
Dataset provided by
Cunningham, Padraig
Riaboff, Lucile
McPherson, Sarah E.
Kennedy, Emer
Dissanayake, Oshana
Allyndrée, Joseph
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AcTBeCalf Dataset Description

The AcTBeCalf dataset is a comprehensive dataset designed to support the classification of pre-weaned calf behaviors from accelerometer data. It contains detailed accelerometer readings aligned with annotated behaviors, providing a valuable resource for research in multivariate time-series classification and animal behavior analysis. The dataset includes accelerometer data collected from 30 pre-weaned Holstein Friesian and Jersey calves, housed in group pens at the Teagasc Moorepark Research Farm, Ireland. Each calf was equipped with a 3D accelerometer sensor (AX3, Axivity Ltd, Newcastle, UK) sampling at 25 Hz and attached to a neck collar from one week of birth over 13 weeks.

This dataset encompasses 27.4 hours of accelerometer data aligned with calf behaviors, including both prominent behaviors like lying, standing, and running, as well as less frequent behaviors such as grooming, social interaction, and abnormal behaviors.

The dataset consists of a single CSV file with the following columns:

dateTime: Timestamp of the accelerometer reading, sampled at 25 Hz.

calfid: Identification number of the calf (1-30).

accX: Accelerometer reading for the X axis (top-bottom direction)*.

accY: Accelerometer reading for the Y axis (backward-forward direction)*.

accZ: Accelerometer reading for the Z axis (left-right direction)*.

behavior: Annotated behavior based on an ethogram of 23 behaviors.

segId: Segment identification number associated with each accelerometer reading/row, representing all readings of the same behavior segment.

the directions are mentioned in relation to the position of the accelerometer sensor on the calf.

Code Files Description

The dataset is accompanied by several code files to facilitate the preprocessing and analysis of the accelerometer data and to support the development and evaluation of machine learning models. The main code files included in the dataset repository are:

accelerometer_time_correction.ipynb: This script corrects the accelerometer time drift, ensuring the alignment of the accelerometer data with the reference time.

shake_pattern_detector.py: This script includes an algorithm to detect shake patterns in the accelerometer signal for aligning the accelerometer time series with reference times.

aligning_accelerometer_data_with_annotations.ipynb: This notebook aligns the accelerometer time series with the annotated behaviors based on timestamps.

manual_inspection_ts_validation.ipynb: This notebook provides a manual inspection process for ensuring the accurate alignment of the accelerometer data with the annotated behaviors.

additional_ts_generation.ipynb: This notebook generates additional time-series data from the original X, Y, and Z accelerometer readings, including Magnitude, ODBA (Overall Dynamic Body Acceleration), VeDBA (Vectorial Dynamic Body Acceleration), pitch, and roll.

genSplit.py: This script provides the logic used for the generalized subject separation for machine learning model training, validation and testing.

active_inactive_classification.ipynb: This notebook details the process of classifying behaviors into active and inactive categories using a RandomForest model, achieving a balanced accuracy of 92%.

four_behv_classification.ipynb: This notebook employs the mini-ROCKET feature derivation mechanism and a RidgeClassifierCV to classify behaviors into four categories: drinking milk, lying, running, and other, achieving a balanced accuracy of 84%.

Kindly cite one of the following papers when using this data:

Dissanayake, O., McPherson, S. E., Allyndrée, J., Kennedy, E., Cunningham, P., & Riaboff, L. (2024). Evaluating ROCKET and Catch22 features for calf behaviour classification from accelerometer data using Machine Learning models. arXiv preprint arXiv:2404.18159.

Dissanayake, O., McPherson, S. E., Allyndrée, J., Kennedy, E., Cunningham, P., & Riaboff, L. (2024). Development of a digital tool for monitoring the behaviour of pre-weaned calves using accelerometer neck-collars. arXiv preprint arXiv:2406.17352
f
Comparison of results due to parameter tuning for five k-fold...
plos.figshare.com
xls
Updated Aug 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd (2024). Comparison of results due to parameter tuning for five k-fold cross-validation. [Dataset]. http://doi.org/10.1371/journal.pone.0305035.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0305035.t005
Dataset updated
Aug 27, 2024
Dataset provided by
PLOS ONE
Authors
Refat Khan Pathan; Israt Jahan Shorna; Md. Sayem Hossain; Mayeen Uddin Khandaker; Huda I. Almohammed; Zuhal Y. Hamd
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of results due to parameter tuning for five k-fold cross-validation.
f
Table 1_A hybrid fuzzy logic–Random Forest model to predict psychiatric...
frontiersin.figshare.com
xlsx
Updated Jun 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandre Hudon (2025). Table 1_A hybrid fuzzy logic–Random Forest model to predict psychiatric treatment order outcomes: an interpretable tool for legal decision support.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1606250.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2025.1606250.s001
Dataset updated
Jun 17, 2025
Dataset provided by
Frontiers
Authors
Alexandre Hudon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundDecisions surrounding involuntary psychiatric treatment orders often involve complex clinical, legal, and ethical considerations, especially when patients lack decisional capacity and refuse treatment. In Quebec, these orders are issued by the Superior Court based on a combination of medical, legal, and behavioral evidence. However, no transparent, evidence-informed predictive tools currently exist to estimate the likelihood of full treatment order acceptance. This study aims to develop and evaluate a hybrid fuzzy logic–machine learning model to predict such outcomes and identify important influencing factors.MethodsA retrospective dataset of 176 Superior Court judgments rendered in Quebec in 2024 was curated from SOQUIJ, encompassing demographic, clinical, and legal variables. A Mamdani-type fuzzy inference system was constructed to simulate expert decision logic and output a continuous likelihood score. This score, along with structured features, was used to train a Random Forest classifier. Model performance was evaluated using accuracy, precision, recall and F1 score. A 10-fold stratified cross-validation was employed for internal validation. Feature importance was also computed to assess the influence of each variable on the prediction outcome.ResultsThe hybrid model achieved an accuracy of 98.1%, precision of 93.3%, recall of 100%, and a F1 score of 96.6. The most influential predictors were the duration of time granted by the court, duration requested by the clinical team, and age of the defendant. Fuzzy logic features such as severity, compliance, and a composite Burden_Score also significantly contributed to prediction accuracy. Only one misclassified case was observed in the test set, and the system provided interpretable decision logic consistent with expert reasoning.ConclusionThis exploratory study offers a novel approach for decision support in forensic psychiatric contexts. Future work should aim to validate the model across other jurisdictions, incorporate more advanced natural language processing for semantic feature extraction, and explore dynamic rule optimization techniques. These enhancements would further improve generalizability, fairness, and practical utility in real-world clinical and legal settings.
f
The logical rules learned by the first three decision trees to classify a...
plos.figshare.com
xls
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonas Bischofberger; Arnold Baca; Erich Schikuta (2024). The logical rules learned by the first three decision trees to classify a play as a shot, for each data set. [Dataset]. http://doi.org/10.1371/journal.pone.0298107.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0298107.t005
Dataset updated
Apr 18, 2024
Dataset provided by
PLOS ONE
Authors
Jonas Bischofberger; Arnold Baca; Erich Schikuta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dstart,goal: Distance from play origin to goal. Dstart,goal: Distance from play end position to goal-line. Aopen: Opening angle of the goal from play origin. Yend*: End position of the play, projected onto goal-line.

Facebook

Twitter

Click to copy link

Link copied

Cite

Artificial Intelligence & Machine Learning Lab at TU Darmstadt (2023). v-lol-trains [Dataset]. https://huggingface.co/datasets/AIML-TUDA/v-lol-trains

v-lol-trains

AIML-TUDA/v-lol-trains

V-LoL: A Diagnostic Dataset for Visual Logical Learning

Explore at:

Dataset updated

Jul 13, 2023

Dataset authored and provided by

Artificial Intelligence & Machine Learning Lab at TU Darmstadt

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset Card for Dataset Name

  Dataset Summary

This diagnostic dataset (website, paper) is specifically designed to evaluate the visual logical learning capabilities of machine learning models. It offers a seamless integration of visual and logical challenges, providing 2D images of complex visual trains, where the classification is derived from rule-based logic. The fundamental idea of V-LoL remains to integrate the explicit logical learning tasks of classic symbolic AI… See the full description on the dataset page: https://huggingface.co/datasets/AIML-TUDA/v-lol-trains.

Clear search

Close search

Google apps

Main menu

v-lol-trains

OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated...

Detailed results of four ML models on the overall test dataset.

primitives.pl from Hypothesizing an algorithm from one example: the role of...

MLRegTest: A benchmark for the machine learning of regular languages

Languages

Driving Behavior Dataset

Data_Sheet_1_There Is Hope After All: Quantifying Opinion and...

Data from: Simple Dataset for Proof Method Recommendation in Isabelle/HOL

Dataset distribution between train and test sets.

Data from: Dataset of Paper "Enhancing photovoltaic cell classification...

The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial...

Data from: TextBite: A Historical Czech Document Dataset for Logical Page...

Mksc Dataset

The UNSW-NB15 dataset with binarized features

Sodoku_Puzzle_Generator

Replication Data for: Unravelling the Dark Web: explainable inference of the...

Data from: Accelerometer-Based Multivariate Time-Series Dataset for Calf...

Comparison of results due to parameter tuning for five k-fold...

Table 1_A hybrid fuzzy logic–Random Forest model to predict psychiatric...

The logical rules learned by the first three decision trees to classify a...

v-lol-trains

AIML-TUDA/v-lol-trains

V-LoL: A Diagnostic Dataset for Visual Logical Learning