100+ datasets found

i
Random Numbers
ieee-dataport.org
Updated Mar 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Outman (2023). Random Numbers [Dataset]. https://ieee-dataport.org/documents/random-numbers
Explore at:
Dataset updated
Mar 14, 2023
Authors
Alexander Outman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset includes random number generated through various methods.Method 1: shuf https://www.mankier.com/1/shufCommands used to generate dataset files: $ shuf -i 1-1000000000 -n1000000 -o random-shuf.txt$ shuf -i 1-1000000000000 -n1000000 -o random-shuf-1-1000000000000.txt$ jot -r 1000000 1 1000000000000 > random-jot-1-1000000000000.txt
f
Dataset for: Simulation and data-generation for random-effects network...
wiley.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Svenja Elisabeth Seide; Katrin Jensen; Meinhard Kieser (2023). Dataset for: Simulation and data-generation for random-effects network meta-analysis of binary outcome [Dataset]. http://doi.org/10.6084/m9.figshare.8001863.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8001863.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Wiley
Authors
Svenja Elisabeth Seide; Katrin Jensen; Meinhard Kieser
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The performance of statistical methods is frequently evaluated by means of simulation studies. In case of network meta-analysis of binary data, however, available data- generating models are restricted to either inclusion of two-armed trials or the fixed-effect model. Based on data-generation in the pairwise case, we propose a framework for the simulation of random-effect network meta-analyses including multi-arm trials with binary outcome. The only of the common data-generating models which is directly applicable to a random-effects network setting uses strongly restrictive assumptions. To overcome these limitations, we modify this approach and derive a related simulation procedure using odds ratios as effect measure. The performance of this procedure is evaluated with synthetic data and in an empirical example.
Z
kac_drumset: A Dataset Generator for Arbitrarily Shaped Drums
data.niaid.nih.gov
zenodo.org
Updated Dec 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lewis Wolstanholme (2022). kac_drumset: A Dataset Generator for Arbitrarily Shaped Drums [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7057218
Explore at:
Dataset updated
Dec 16, 2022
Dataset authored and provided by
Lewis Wolstanholme
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This publication documents the various datasets generated using the kac_drumset codebase. The aims of kac_drumset is to provide a robust framework for the generation and analysis of arbitrarily shaped drums. The source code for this project is available here: https://github.com/lewiswolf/kac_drumset.

Background

Arbitrarily shaped drums are a strange family of percussion instruments and a wholly meta-physical construction in this contemporary setting. These percussive instruments possess a number of interesting musical characteristics resulting from their particular geometric designs. As it currently stands, these instruments remain largely unexplored throughout musical practice, as they were originally devised as a collection of hypothetical mathematical objects. These datasets serve to sonify these objects so as to explore these conceptual constructions in the audio domain.

Usage

To use these datasets, first install kac_drumset:

pip install "git+https://github.com/lewiswolf/kac_drumset.git#egg=kac_drumset"

And then in python:

from kac_drumset import ( # methods loadDataset, transformDataset, # classes TorchDataset, )

dataset: TorchDataset = transformDataset( # load a dataset (any folder containing a metadata.json) loadDataset('absolute/path/to/data'), # alter the dataset representation, either as an end2end, fft or mel. {'output_type': 'end2end'}, )

use the dataset

for i in range(dataset._len_()): x, y = dataset._getitem_(i) ...

For more details on using kac_drumset, see the project's documentation.

2000 Convex Polygonal Drums of Varying Size

Each sample in this dataset corresponds to a randomly generated convex polygon. The audio for each sample was generated using a two-dimensional physical model of a drum. Each sample is one second long and decays linearly.

Contained in this dataset are ten different sizes of drums - 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.5, 0.6 - each of which is a measure of the longest vertex of each drum in meters. There are 40 different drums sampled for each size. Each drum is sampled five times, first by being struck in the geometric centroid, and then by being struck four more times in random locations. This dataset is labelled with the vertices of each polygon, normalised to the unit interval, and the strike location of each sample.

The audio is sampled at 48khz, and the default representation is raw audio. Each sample is stored in the metadata.json, alongside being made available audibly as a 24-bit .wav and graphically as a .png.

5000 Circular Drums of Varying Size

Each sample in this dataset corresponds to a randomly generated circular drum. The audio for each sample was generated using additive synthesis, inferred using a closed form solution to the two dimensional wave equation. Each sample is one second long and decays exponentially.

Contained in this dataset are 1000 different drums, each determined by a randomly generated size (0.1, 2.0) in meters. Each drum is sampled five times, first being struck in the geometric centroid, and then by being struck four more times in random locations. This dataset is labelled with the size of each drum and the strike location of each sample.

The audio is sampled at 48khz, and the default representation is raw audio. Each sample is stored in the metadata.json, alongside being made available audibly as a 24-bit .wav and graphically as a .png.

5000 Rectangular Drums of Varying Dimension

Each sample in this dataset corresponds to a randomly generated rectangular drum. The audio for each sample was generated using additive synthesis, inferred using a closed form solution to the two dimensional wave equation. Each sample is one second long and decays exponentially.

Contained in this dataset are 1000 different drums, each determined by a randomly generated size (0.1, 2.0) in meters and aspect ratio (0.25, 4.0). Each drum is sampled five times, first being struck in the geometric centroid, and then by being struck four more times in random locations. This dataset is labelled with the size and aspect ratio of each drum, and the strike location of each sample.

The audio is sampled at 48khz, and the default representation is raw audio. Each sample is stored in the metadata.json, alongside being made available audibly as a 24-bit .wav and graphically as a .png.
h
fun-club-name-generator-dataset
huggingface.co
Updated Apr 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mitchell (2025). fun-club-name-generator-dataset [Dataset]. https://huggingface.co/datasets/Laurenfromhere/fun-club-name-generator-dataset
Explore at:
Dataset updated
Apr 5, 2025
Authors
Mitchell
Description
Fun Club Name Generator Dataset

This is a small, handcrafted dataset of random and fun club name ideas.The goal is to help people who are stuck naming something — whether it's a book club, a gaming group, a project, or just a Discord server between friends.

Why this?

A few friends and I spent hours trying to name a casual group — everything felt cringey, too serious, or already taken. We started writing down names that made us laugh, and eventually collected enough to… See the full description on the dataset page: https://huggingface.co/datasets/Laurenfromhere/fun-club-name-generator-dataset.
i
Dataset for Analysis of a Programmable Quantum Annealer as a Random Number...
ieee-dataport.org
zenodo.org
Updated Feb 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elijah Pelofske (2024). Dataset for Analysis of a Programmable Quantum Annealer as a Random Number Generator [Dataset]. https://ieee-dataport.org/documents/dataset-analysis-programmable-quantum-annealer-random-number-generator
Explore at:
Dataset updated
Feb 1, 2024
Authors
Elijah Pelofske
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
producing 8 distinct datasets.
f
Microsoft excel database containing all the simulated (10 sets) and...
plos.figshare.com
xlsx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hamed Ahmadi (2023). Microsoft excel database containing all the simulated (10 sets) and experimental data used in this study. [Dataset]. http://doi.org/10.1371/journal.pone.0187292.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0187292.s001
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Hamed Ahmadi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Excel sheets in order: The sheet entitled “Hens Original Data” contains the results of an experiment conducted to study the response of laying hens during initial phase of egg production subjected to different intakes of dietary threonine. The sheet entitled “Simulated data & fitting values” contains the 10 simulated data sets that were generated using a standard procedure of random number generator. The predicted values obtained by the new three-parameter and conventional four-parameter logistic models were also appeared in this sheet. (XLSX)
Random number generators minimum technical requirements - Dataset -...
publications.qld.gov.au
Updated Jun 19, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
www.publications.qld.gov.au (2014). Random number generators minimum technical requirements - Dataset - Publications | Queensland Government [Dataset]. https://www.publications.qld.gov.au/dataset/random-number-generators-minimum-technical-requirements
Explore at:
Dataset updated
Jun 19, 2014
Dataset provided by
Queensland Governmenthttp://qld.gov.au/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Queensland, Queensland Government
Description
Read this document to understand the minimum technical requirements for random number generators (RNGs) used in gaming-related equipment and systems.

1k Random 3D Shapes

kaggle.com

Updated May 17, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

makra (2023). 1k Random 3D Shapes [Dataset]. https://www.kaggle.com/datasets/makra2077/1000-random-3d-shapes

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 17, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

makra

License

ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically

Description

About

The dataset contains 1000 images with a random shape (of 17 possible shapes). This dataset is generated using the 3D Shapes Dataset Generator I've developed. Feel free to use it from here.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15136143%2F434f4faa08f5f66033feca35f6c682f3%2Flogo_op_spidey.ico?generation=1684269448893545&alt=media" alt="">

Label

Column Name	Info
filename	Name of the image file
shape	Shape Index
operation	Operation Index
a,b,c,d,e,f,g,h,i,j,k,l	Dimensional parameters
hue, sat, val	HSV Values of the color
rot_x, rot_y, rot_z	Euler Angles
pos_x, pos_y, pos_z	Position Vector

Each row depicts information about a shape in the image of a dataset.

Seed The seed value of the dataset is stored in a txt file and can be used to re-generate the dataset using the tool.

Z
Dataset Artifact for paper "Root Cause Analysis for Microservice System...
data.niaid.nih.gov
Updated Aug 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhang, Hongyu (2024). Dataset Artifact for paper "Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13305662
Explore at:
Dataset updated
Aug 25, 2024
Dataset provided by
Zhang, Hongyu
Ha, Huong
Pham, Luan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Artifacts for the paper titled Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?.

This artifact repository contains 9 compressed folders, as follows:

ID File Name Description

1 syn_circa.zip CIRCA10, and CIRCA50 datasets for Causal Discovery

2 syn_rcd.zip RCD10, and RCD50 datasets for Causal Discovery

3 syn_causil.zip CausIL10, and CausIL50 datasets for Causal Discovery

4 rca_circa.zip CIRCA10, and CIRCA50 datasets for RCA

5 rca_rcd.zip RCD10, and RCD50 datasets for RCA

6 online-boutique.zip Online Boutique dataset for RCA

7 sock-shop-1.zip Sock Shop 1 dataset for RCA

8 sock-shop-2.zip Sock Shop 2 dataset for RCA

9 train-ticket.zip Train Ticket dataset for RCA

Each zip file contains the generated/collected data from the corresponding data generator or microservice benchmark systems (e.g., online-boutique.zip contains metrics data collected from the Online Boutique system).

Details about the generation of our datasets

Synthetic datasets

We use three different synthetic data generators from three previous RCA studies [15, 25, 28] to create the synthetic datasets: CIRCA, RCD, and CausIL data generators. Their mechanisms are as follows:1. CIRCA datagenerator [28] generates a random causal directed acyclic graph (DAG) based on a given number of nodes and edges. From this DAG, time series data for each node is generated using a vector auto-regression (VAR) model. A fault is injected into a node by altering the noise term in the VAR model for two timestamps. 2. RCD data generator [25] uses the pyAgrum package [3] to generate a random DAG based on a given number of nodes, subsequently generating discrete time series data for each node, with values ranging from 0 to 5. A fault is introduced into a node by changing its conditional probability distribution.3. CausIL data generator [15] generates causal graphs and time series data that simulate the behavior of microservice systems. It first constructs a DAG of services and metrics based on domain knowledge, then generates metric data for each node of the DAG using regressors trained on real metrics data. Unlike the CIRCA and RCD data generators, the CausIL data generator does not have the capability to inject faults.To create our synthetic datasets, we first generate 10 DAGs whose nodes range from 10 to 50 for each of the synthetic data generators. Next, we generate fault-free datasets using these DAGs with different seedings, resulting in 100 cases for the CIRCA and RCD generators and 10 cases for the CausIL generator. We then create faulty datasets by introducing ten faults into each DAG and generating the corresponding faulty data, yielding 100 cases for the CIRCA and RCD data generators. The fault-free datasets (e.g. syn_rcd, syn_circa) are used to evaluate causal discovery methods, while the faulty datasets (e.g. rca_rcd, rca_circa) are used to assess RCA methods.

Data collected from benchmark microservice systems

We deploy three popular benchmark microservice systems: Sock Shop [6], Online Boutique [4], and Train Ticket [8], on a four-node Kubernetes cluster hosted by AWS. Next, we use the Istio service mesh [2] with Prometheus [5] and cAdvisor [1] to monitor and collect resource-level and service-level metrics of all services, as in previous works [ 25 , 39, 59 ]. To generate traffic, we use the load generators provided by these systems and customise them to explore all services with 100 to 200 users concurrently. We then introduce five common faults (CPU hog, memory leak, disk IO stress, network delay, and packet loss) into five different services within each system. Finally, we collect metrics data before and after the fault injection operation. An overview of our setup is presented in the Figure below.

Code

The code to reproduce the experimental results in the paper is available at https://github.com/phamquiluan/RCAEval.

References

As in our paper.
h
DynaMath_Sample
huggingface.co
Updated Oct 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DynaMath Team (2024). DynaMath_Sample [Dataset]. https://huggingface.co/datasets/DynaMath/DynaMath_Sample
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 31, 2024
Dataset authored and provided by
DynaMath Team
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for DynaMath

[💻 Github] [🌐 Homepage][📖 Preprint Paper]

Dataset Details 🔈 Notice

DynaMath is a dynamic benchmark with 501 seed question generators. This dataset is only a sample of 10 variants generated by DynaMath. We encourage you to use the dataset generator on our github site to generate random datasets to test.

🌟 About DynaMath

The rapid advancements in Vision-Language Models (VLMs) have shown significant potential in tackling… See the full description on the dataset page: https://huggingface.co/datasets/DynaMath/DynaMath_Sample.
Dataset for Accessing Cosmic Radiation as an Entropy Source for a...
zenodo.org
data.niaid.nih.gov
bin, text/x-python
Updated May 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefan Kutschera; Stefan Kutschera; Wolfgang Slany; Wolfgang Slany; Patrick Ratschiller; Patrick Ratschiller; Sarina Gursch; Sarina Gursch; Håvard Dagenborg; Håvard Dagenborg (2023). Dataset for Accessing Cosmic Radiation as an Entropy Source for a Non-Deterministic Random Number Generator [Dataset]. http://doi.org/10.5281/zenodo.7774330
Explore at:
bin, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7774330
Dataset updated
May 25, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Stefan Kutschera; Stefan Kutschera; Wolfgang Slany; Wolfgang Slany; Patrick Ratschiller; Patrick Ratschiller; Sarina Gursch; Sarina Gursch; Håvard Dagenborg; Håvard Dagenborg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains all gathered data from the experiment from Wednesday, March 16, 2022 11:58:41.929 AM UTC+0 (1647431921929) until Sunday, April 3, 2022 1:08:35.353 PM UTC+0 (1648991315353). The experiment was executed during physical presence within the Arctic Circle in Tromsø, Norway 69° 40' 53.117'' N 18° 58' 36.027'' E at 35m elevation above sea level. The dataset was gathered with a prototype [1] based on the CREDO android application [2]. The main research is to use Ultra High Energy Cosmic Rays (UHECR) as an entropy source for a Random Bit Generator (RBG).

The associated publication will probably have the title "Accessing Cosmic Radiation as an Entropy Source for a Non-Deterministic Random Number Generator"

In order to reproduce the results the SQLite3 database "mrng_arctic_experiment_2022.db" is needed. To get the visual representations of the detections use "image_decoding_and_codesnippets.py" to generate the cleaned (414 detections / ~15MB) or the uncleaned (5567 detections / ~195 MB) dataset. The compressed folder "raw_data_incl_space_weather.7z" contains all raw data as gathered with the MRNG prototype, unprocessed, uncleaned, and unmerged.

[1] https://github.com/StefanKutschera/mrng-prototype, visited on 27.03.2023

[2] https://github.com/credo-science/credo-detector-android, visited on 27.03.2023
h
dummy_health_data
huggingface.co
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mudumbai Vraja Kishore (2025). dummy_health_data [Dataset]. https://huggingface.co/datasets/vrajakishore/dummy_health_data
Explore at:
Dataset updated
May 29, 2025
Authors
Mudumbai Vraja Kishore
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Synthetic Healthcare Dataset

Overview

This dataset is a synthetic healthcare dataset created for use in data analysis. It mimics real-world patient healthcare data and is intended for applications within the healthcare industry.

Data Generation

The data has been generated using the Faker Python library, which produces randomized and synthetic records that resemble real-world data patterns. It includes various healthcare-related fields such as patient… See the full description on the dataset page: https://huggingface.co/datasets/vrajakishore/dummy_health_data.
Statistical testing result of accelerometer data processed for random number...
figshare.com
zip
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S Lee Hong; Chang Liu (2016). Statistical testing result of accelerometer data processed for random number generator seeding [Dataset]. http://doi.org/10.6084/m9.figshare.1273869.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1273869.v1
Dataset updated
Jan 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
S Lee Hong; Chang Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set contains the result of applying the NIST Statistical Test Suite on accelerometer data processed for random number generator seeding. The NIST Statistical Test Suite can be downloaded from: http://csrc.nist.gov/groups/ST/toolkit/rng/documentation_software.html. The format of the output is explained in http://csrc.nist.gov/publications/nistpubs/800-22-rev1a/SP800-22rev1a.pdf.
b
Generation of random numbers by measuring on a silicon-on-insulator chip...
data.bris.ac.uk
Updated Jun 14, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Generation of random numbers by measuring on a silicon-on-insulator chip phase fluctuations from a laser diode - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/2nvzdjr7gy4ox2njh7a6tbwks6
Explore at:
Dataset updated
Jun 14, 2018
Description
Underpinning data for manuscript entitled "Generation of random numbers by measuring on a silicon-on-insulator chip phase fluctuations from a laser diode"
h
generated-usa-passeports-dataset
huggingface.co
Updated Jul 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2023). generated-usa-passeports-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/generated-usa-passeports-dataset
Explore at:
Dataset updated
Jul 15, 2023
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Data generation in machine learning involves creating or manipulating data to train and evaluate machine learning models. The purpose of data generation is to provide diverse and representative examples that cover a wide range of scenarios, ensuring the model's robustness and generalization. Data augmentation techniques involve applying various transformations to existing data samples to create new ones. These transformations include: random rotations, translations, scaling, flips, and more. Augmentation helps in increasing the dataset size, introducing natural variations, and improving model performance by making it more invariant to specific transformations. The dataset contains GENERATED USA passports, which are replicas of official passports but with randomly generated details, such as name, date of birth etc. The primary intention of generating these fake passports is to demonstrate the structure and content of a typical passport document and to train the neural network to identify this type of document. Generated passports can assist in conducting research without accessing or compromising real user data that is often sensitive and subject to privacy regulations. Synthetic data generation allows researchers to develop and refine models using simulated passport data without risking privacy leaks.
S
Dataset of Membership Inference Attack Defense Strategies
scidb.cn
Updated Jan 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yongzhi Cao (2024). Dataset of Membership Inference Attack Defense Strategies [Dataset]. http://doi.org/10.57760/sciencedb.nbsdc.00098
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.nbsdc.00098
Dataset updated
Jan 15, 2024
Dataset provided by
Science Data Bank
Authors
Yongzhi Cao
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The random number generation algorithm used within the consensus mechanism of blockchain systems may be plagued by member inference attacks, resulting in the inference of the algorithm's features or patterns of generated random numbers. Based on this issue, the research group proposed a resistance scheme based on knowledge distillation to ensure the security of the random number generation algorithm. The research group used a member inference attack defense strategy dataset to evaluate the performance of our proposed defense scheme, which includes 5 batch training datasets and 1 test dataset. By analyzing the performance changes of machine learning models after being subjected to member inference attacks on this dataset, evaluate the performance of member inference attack defense strategies. Collection plan: The folder name of the test dataset is "Member Reasoning Attack Resistance Strategy Dataset/cifar-10 patches py". CIFAR-10 is a small dataset used to identify ubiquitous objects, which can be accessed through the following link http://www.cs.toronto.edu/ ~Kriz/cifar. HTML download. Contains 10 categories of RGB color images. Each image has a size of 32 × 32. Each category has 6000 images, and there are a total of 50000 training images and 10000 test images in the dataset. Time and location: This dataset is test data collected by the research unit "Peking University" during 2021. Equipment situation: Data collection is processed in the following environment: hardware environment: supports general computing platforms such as Intel and ARM; System environment: Windows 11 and Ubuntu 20.04.
Z
TRAVEL: A Dataset with Toolchains for Test Generation and Regression Testing...
data.niaid.nih.gov
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessio Gambi (2024). TRAVEL: A Dataset with Toolchains for Test Generation and Regression Testing of Self-driving Cars Software [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5911160
Explore at:
Dataset updated
Jul 17, 2024
Dataset provided by
Annibale Panichella
Sebastiano Panichella
Alessio Gambi
Christian Birchler
Pouria Derakhshanfar
Vincenzo Riccio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

This repository hosts the Testing Roads for Autonomous VEhicLes (TRAVEL) dataset. TRAVEL is an extensive collection of virtual roads that have been used for testing lane assist/keeping systems (i.e., driving agents) and data from their execution in state of the art, physically accurate driving simulator, called BeamNG.tech. Virtual roads consist of sequences of road points interpolated using Cubic splines.

Along with the data, this repository contains instructions on how to install the tooling necessary to generate new data (i.e., test cases) and analyze them in the context of test regression. We focus on test selection and test prioritization, given their importance for developing high-quality software following the DevOps paradigms.

This dataset builds on top of our previous work in this area, including work on

test generation (e.g., AsFault, DeepJanus, and DeepHyperion) and the SBST CPS tool competition (SBST2021),

test selection: SDC-Scissor and related tool

test prioritization: automated test cases prioritization work for SDCs.

Dataset Overview

The TRAVEL dataset is available under the data folder and is organized as a set of experiments folders. Each of these folders is generated by running the test-generator (see below) and contains the configuration used for generating the data (experiment_description.csv), various statistics on generated tests (generation_stats.csv) and found faults (oob_stats.csv). Additionally, the folders contain the raw test cases generated and executed during each experiment (test..json).

The following sections describe what each of those files contains.

Experiment Description

The experiment_description.csv contains the settings used to generate the data, including:

Time budget. The overall generation budget in hours. This budget includes both the time to generate and execute the tests as driving simulations.

The size of the map. The size of the squared map defines the boundaries inside which the virtual roads develop in meters.

The test subject. The driving agent that implements the lane-keeping system under test. The TRAVEL dataset contains data generated testing the BeamNG.AI and the end-to-end Dave2 systems.

The test generator. The algorithm that generated the test cases. The TRAVEL dataset contains data obtained using various algorithms, ranging from naive and advanced random generators to complex evolutionary algorithms, for generating tests.

The speed limit. The maximum speed at which the driving agent under test can travel.

Out of Bound (OOB) tolerance. The test cases' oracle that defines the tolerable amount of the ego-car that can lie outside the lane boundaries. This parameter ranges between 0.0 and 1.0. In the former case, a test failure triggers as soon as any part of the ego-vehicle goes out of the lane boundary; in the latter case, a test failure triggers only if the entire body of the ego-car falls outside the lane.

Experiment Statistics

The generation_stats.csv contains statistics about the test generation, including:

Total number of generated tests. The number of tests generated during an experiment. This number is broken down into the number of valid tests and invalid tests. Valid tests contain virtual roads that do not self-intersect and contain turns that are not too sharp.

Test outcome. The test outcome contains the number of passed tests, failed tests, and test in error. Passed and failed tests are defined by the OOB Tolerance and an additional (implicit) oracle that checks whether the ego-car is moving or standing. Tests that did not pass because of other errors (e.g., the simulator crashed) are reported in a separated category.

The TRAVEL dataset also contains statistics about the failed tests, including the overall number of failed tests (total oob) and its breakdown into OOB that happened while driving left or right. Further statistics about the diversity (i.e., sparseness) of the failures are also reported.

Test Cases and Executions

Each test..json contains information about a test case and, if the test case is valid, the data observed during its execution as driving simulation.

The data about the test case definition include:

The road points. The list of points in a 2D space that identifies the center of the virtual road, and their interpolation using cubic splines (interpolated_points)

The test ID. The unique identifier of the test in the experiment.

Validity flag and explanation. A flag that indicates whether the test is valid or not, and a brief message describing why the test is not considered valid (e.g., the road contains sharp turns or the road self intersects)

The test data are organized according to the following JSON Schema and can be interpreted as RoadTest objects provided by the tests_generation.py module.

{ "type": "object", "properties": { "id": { "type": "integer" }, "is_valid": { "type": "boolean" }, "validation_message": { "type": "string" }, "road_points": { §\label{line:road-points}§ "type": "array", "items": { "$ref": "schemas/pair" }, }, "interpolated_points": { §\label{line:interpolated-points}§ "type": "array", "items": { "$ref": "schemas/pair" }, }, "test_outcome": { "type": "string" }, §\label{line:test-outcome}§ "description": { "type": "string" }, "execution_data": { "type": "array", "items": { "$ref" : "schemas/simulationdata" } } }, "required": [ "id", "is_valid", "validation_message", "road_points", "interpolated_points" ] }

Finally, the execution data contain a list of timestamped state information recorded by the driving simulation. State information is collected at constant frequency and includes absolute position, rotation, and velocity of the ego-car, its speed in Km/h, and control inputs from the driving agent (steering, throttle, and braking). Additionally, execution data contain OOB-related data, such as the lateral distance between the car and the lane center and the OOB percentage (i.e., how much the car is outside the lane).

The simulation data adhere to the following (simplified) JSON Schema and can be interpreted as Python objects using the simulation_data.py module.

{ "$id": "schemas/simulationdata", "type": "object", "properties": { "timer" : { "type": "number" }, "pos" : { "type": "array", "items":{ "$ref" : "schemas/triple" } } "vel" : { "type": "array", "items":{ "$ref" : "schemas/triple" } } "vel_kmh" : { "type": "number" }, "steering" : { "type": "number" }, "brake" : { "type": "number" }, "throttle" : { "type": "number" }, "is_oob" : { "type": "number" }, "oob_percentage" : { "type": "number" } §\label{line:oob-percentage}§ }, "required": [ "timer", "pos", "vel", "vel_kmh", "steering", "brake", "throttle", "is_oob", "oob_percentage" ] }

Dataset Content

The TRAVEL dataset is a lively initiative so the content of the dataset is subject to change. Currently, the dataset contains the data collected during the SBST CPS tool competition, and data collected in the context of our recent work on test selection (SDC-Scissor work and tool) and test prioritization (automated test cases prioritization work for SDCs).

SBST CPS Tool Competition Data

The data collected during the SBST CPS tool competition are stored inside data/competition.tar.gz. The file contains the test cases generated by Deeper, Frenetic, AdaFrenetic, and Swat, the open-source test generators submitted to the competition and executed against BeamNG.AI with an aggression factor of 0.7 (i.e., conservative driver).

Name Map Size (m x m) Max Speed (Km/h) Budget (h) OOB Tolerance (%) Test Subject DEFAULT 200 × 200 120 5 (real time) 0.95 BeamNG.AI - 0.7 SBST 200 × 200 70 2 (real time) 0.5 BeamNG.AI - 0.7

Specifically, the TRAVEL dataset contains 8 repetitions for each of the above configurations for each test generator totaling 64 experiments.

SDC Scissor

With SDC-Scissor we collected data based on the Frenetic test generator. The data is stored inside data/sdc-scissor.tar.gz. The following table summarizes the used parameters.

Name Map Size (m x m) Max Speed (Km/h) Budget (h) OOB Tolerance (%) Test Subject SDC-SCISSOR 200 × 200 120 16 (real time) 0.5 BeamNG.AI - 1.5

The dataset contains 9 experiments with the above configuration. For generating your own data with SDC-Scissor follow the instructions in its repository.

Dataset Statistics

Here is an overview of the TRAVEL dataset: generated tests, executed tests, and faults found by all the test generators grouped by experiment configuration. Some 25,845 test cases are generated by running 4 test generators 8 times in 2 configurations using the SBST CPS Tool Competition code pipeline (SBST in the table). We ran the test generators for 5 hours, allowing the ego-car a generous speed limit (120 Km/h) and defining a high OOB tolerance (i.e., 0.95), and we also ran the test generators using a smaller generation budget (i.e., 2 hours) and speed limit (i.e., 70 Km/h) while setting the OOB tolerance to a lower value (i.e., 0.85). We also collected some 5, 971 additional tests with SDC-Scissor (SDC-Scissor in the table) by running it 9 times for 16 hours using Frenetic as a test generator and defining a more realistic OOB tolerance (i.e., 0.50).

Generating new Data

Generating new data, i.e., test cases, can be done using the SBST CPS Tool Competition pipeline and the driving simulator BeamNG.tech.

Extensive instructions on how to install both software are reported inside the SBST CPS Tool Competition pipeline Documentation;
h
french_last_names_insee_2024
huggingface.co
Updated Nov 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ronan L.M. (2024). french_last_names_insee_2024 [Dataset]. http://doi.org/10.57967/hf/3430
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/3430
Dataset updated
Nov 4, 2024
Authors
Ronan L.M.
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
French
Description
French Last Names from Death Records (1970-2024)

This dataset contains French lasst names extracted from death records provided by INSEE (French National Institute of Statistics and Economic Studies) covering the period from 1970 to September 2024.

Dataset Description Random name generator demo

go to https://sctg-development.github.io/french-names-extractor/

Data Source

The data is sourced from INSEE's death records database. It includes last names… See the full description on the dataset page: https://huggingface.co/datasets/eltorio/french_last_names_insee_2024.
e
Knowledge Graph Generator - Dataset - B2FIND
b2find.eudat.eu
Updated Aug 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Knowledge Graph Generator - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/83b50938-a7f7-5b3e-888b-8b64698e2df6
Explore at:
Dataset updated
Aug 1, 2025
Description
Code and experiment results for a synthetic knowledge graph generator. The generator receives a set of rules, with an expected body support and support, and returns a knowledge graph that approximately matches the rules according to the body support and confidence. This code was developed during the Bachelor thesis by Gabriel Glaser, Generating Random Knowledge Graphs from Rules, University of Stuttgart, 2024. Handle 11682/15486.
w
Synthetic Data for an Imaginary Country, Sample, 2023 - World
microdata.worldbank.org
nada-demo.ihsn.org
Updated Jul 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
Explore at:
Dataset updated
Jul 7, 2023
Dataset authored and provided by
Development Data Group, Data Analytics Unit
Time period covered
2023
Area covered
World, World
Description
Abstract

The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

The full-population dataset (with about 10 million individuals) is also distributed as open data.

Geographic coverage

The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

Analysis unit

Household, Individual

Universe

The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

Kind of data

ssd

Sampling procedure

The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

Mode of data collection

other

Research instrument

The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

Cleaning operations

The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

Response rate

This is a synthetic dataset; the "response rate" is 100%.

Facebook

Twitter

Click to copy link

Link copied

Cite

Alexander Outman (2023). Random Numbers [Dataset]. https://ieee-dataport.org/documents/random-numbers

Random Numbers

Explore at:

Dataset updated

Mar 14, 2023

Authors

Alexander Outman

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset includes random number generated through various methods.Method 1: shuf https://www.mankier.com/1/shufCommands used to generate dataset files: $ shuf -i 1-1000000000 -n1000000 -o random-shuf.txt$ shuf -i 1-1000000000000 -n1000000 -o random-shuf-1-1000000000000.txt$ jot -r 1000000 1 1000000000000 > random-jot-1-1000000000000.txt

Clear search

Close search

Google apps

Main menu

Random Numbers

Dataset for: Simulation and data-generation for random-effects network...

kac_drumset: A Dataset Generator for Arbitrarily Shaped Drums

use the dataset

fun-club-name-generator-dataset

Dataset for Analysis of a Programmable Quantum Annealer as a Random Number...

Microsoft excel database containing all the simulated (10 sets) and...

Random number generators minimum technical requirements - Dataset -...

1k Random 3D Shapes

Dataset Artifact for paper "Root Cause Analysis for Microservice System...

DynaMath_Sample

Dataset for Accessing Cosmic Radiation as an Entropy Source for a...

dummy_health_data

Statistical testing result of accelerometer data processed for random number...

Generation of random numbers by measuring on a silicon-on-insulator chip...

generated-usa-passeports-dataset

Dataset of Membership Inference Attack Defense Strategies

TRAVEL: A Dataset with Toolchains for Test Generation and Regression Testing...

french_last_names_insee_2024

Knowledge Graph Generator - Dataset - B2FIND

Synthetic Data for an Imaginary Country, Sample, 2023 - World

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Random Numbers