100+ datasets found
  1. i

    Random Numbers

    • ieee-dataport.org
    Updated Mar 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Outman (2023). Random Numbers [Dataset]. https://ieee-dataport.org/documents/random-numbers
    Explore at:
    Dataset updated
    Mar 14, 2023
    Authors
    Alexander Outman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset includes random number generated through various methods.Method 1: shuf https://www.mankier.com/1/shufCommands used to generate dataset files: $ shuf -i 1-1000000000 -n1000000 -o random-shuf.txt$ shuf -i 1-1000000000000 -n1000000 -o random-shuf-1-1000000000000.txt$ jot -r 1000000 1 1000000000000 > random-jot-1-1000000000000.txt

  2. f

    Dataset for: Simulation and data-generation for random-effects network...

    • wiley.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Svenja Elisabeth Seide; Katrin Jensen; Meinhard Kieser (2023). Dataset for: Simulation and data-generation for random-effects network meta-analysis of binary outcome [Dataset]. http://doi.org/10.6084/m9.figshare.8001863.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Wiley
    Authors
    Svenja Elisabeth Seide; Katrin Jensen; Meinhard Kieser
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The performance of statistical methods is frequently evaluated by means of simulation studies. In case of network meta-analysis of binary data, however, available data- generating models are restricted to either inclusion of two-armed trials or the fixed-effect model. Based on data-generation in the pairwise case, we propose a framework for the simulation of random-effect network meta-analyses including multi-arm trials with binary outcome. The only of the common data-generating models which is directly applicable to a random-effects network setting uses strongly restrictive assumptions. To overcome these limitations, we modify this approach and derive a related simulation procedure using odds ratios as effect measure. The performance of this procedure is evaluated with synthetic data and in an empirical example.

  3. Z

    kac_drumset: A Dataset Generator for Arbitrarily Shaped Drums

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lewis Wolstanholme (2022). kac_drumset: A Dataset Generator for Arbitrarily Shaped Drums [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7057218
    Explore at:
    Dataset updated
    Dec 16, 2022
    Dataset authored and provided by
    Lewis Wolstanholme
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This publication documents the various datasets generated using the kac_drumset codebase. The aims of kac_drumset is to provide a robust framework for the generation and analysis of arbitrarily shaped drums. The source code for this project is available here: https://github.com/lewiswolf/kac_drumset.

    Background

    Arbitrarily shaped drums are a strange family of percussion instruments and a wholly meta-physical construction in this contemporary setting. These percussive instruments possess a number of interesting musical characteristics resulting from their particular geometric designs. As it currently stands, these instruments remain largely unexplored throughout musical practice, as they were originally devised as a collection of hypothetical mathematical objects. These datasets serve to sonify these objects so as to explore these conceptual constructions in the audio domain.

    Usage

    To use these datasets, first install kac_drumset:

    pip install "git+https://github.com/lewiswolf/kac_drumset.git#egg=kac_drumset"

    And then in python:

    from kac_drumset import ( # methods loadDataset, transformDataset, # classes TorchDataset, )

    dataset: TorchDataset = transformDataset( # load a dataset (any folder containing a metadata.json) loadDataset('absolute/path/to/data'), # alter the dataset representation, either as an end2end, fft or mel. {'output_type': 'end2end'}, )

    use the dataset

    for i in range(dataset._len_()): x, y = dataset._getitem_(i) ...

    For more details on using kac_drumset, see the project's documentation.

    2000 Convex Polygonal Drums of Varying Size

    Each sample in this dataset corresponds to a randomly generated convex polygon. The audio for each sample was generated using a two-dimensional physical model of a drum. Each sample is one second long and decays linearly.

    Contained in this dataset are ten different sizes of drums - 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.5, 0.6 - each of which is a measure of the longest vertex of each drum in meters. There are 40 different drums sampled for each size. Each drum is sampled five times, first by being struck in the geometric centroid, and then by being struck four more times in random locations. This dataset is labelled with the vertices of each polygon, normalised to the unit interval, and the strike location of each sample.

    The audio is sampled at 48khz, and the default representation is raw audio. Each sample is stored in the metadata.json, alongside being made available audibly as a 24-bit .wav and graphically as a .png.

    5000 Circular Drums of Varying Size

    Each sample in this dataset corresponds to a randomly generated circular drum. The audio for each sample was generated using additive synthesis, inferred using a closed form solution to the two dimensional wave equation. Each sample is one second long and decays exponentially.

    Contained in this dataset are 1000 different drums, each determined by a randomly generated size (0.1, 2.0) in meters. Each drum is sampled five times, first being struck in the geometric centroid, and then by being struck four more times in random locations. This dataset is labelled with the size of each drum and the strike location of each sample.

    The audio is sampled at 48khz, and the default representation is raw audio. Each sample is stored in the metadata.json, alongside being made available audibly as a 24-bit .wav and graphically as a .png.

    5000 Rectangular Drums of Varying Dimension

    Each sample in this dataset corresponds to a randomly generated rectangular drum. The audio for each sample was generated using additive synthesis, inferred using a closed form solution to the two dimensional wave equation. Each sample is one second long and decays exponentially.

    Contained in this dataset are 1000 different drums, each determined by a randomly generated size (0.1, 2.0) in meters and aspect ratio (0.25, 4.0). Each drum is sampled five times, first being struck in the geometric centroid, and then by being struck four more times in random locations. This dataset is labelled with the size and aspect ratio of each drum, and the strike location of each sample.

    The audio is sampled at 48khz, and the default representation is raw audio. Each sample is stored in the metadata.json, alongside being made available audibly as a 24-bit .wav and graphically as a .png.

  4. h

    fun-club-name-generator-dataset

    • huggingface.co
    Updated Apr 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mitchell (2025). fun-club-name-generator-dataset [Dataset]. https://huggingface.co/datasets/Laurenfromhere/fun-club-name-generator-dataset
    Explore at:
    Dataset updated
    Apr 5, 2025
    Authors
    Mitchell
    Description

    Fun Club Name Generator Dataset

    This is a small, handcrafted dataset of random and fun club name ideas.The goal is to help people who are stuck naming something — whether it's a book club, a gaming group, a project, or just a Discord server between friends.

      Why this?
    

    A few friends and I spent hours trying to name a casual group — everything felt cringey, too serious, or already taken. We started writing down names that made us laugh, and eventually collected enough to… See the full description on the dataset page: https://huggingface.co/datasets/Laurenfromhere/fun-club-name-generator-dataset.

  5. i

    Dataset for Analysis of a Programmable Quantum Annealer as a Random Number...

    • ieee-dataport.org
    • zenodo.org
    Updated Feb 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elijah Pelofske (2024). Dataset for Analysis of a Programmable Quantum Annealer as a Random Number Generator [Dataset]. https://ieee-dataport.org/documents/dataset-analysis-programmable-quantum-annealer-random-number-generator
    Explore at:
    Dataset updated
    Feb 1, 2024
    Authors
    Elijah Pelofske
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    producing 8 distinct datasets.

  6. f

    Microsoft excel database containing all the simulated (10 sets) and...

    • plos.figshare.com
    xlsx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hamed Ahmadi (2023). Microsoft excel database containing all the simulated (10 sets) and experimental data used in this study. [Dataset]. http://doi.org/10.1371/journal.pone.0187292.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Hamed Ahmadi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Excel sheets in order: The sheet entitled “Hens Original Data” contains the results of an experiment conducted to study the response of laying hens during initial phase of egg production subjected to different intakes of dietary threonine. The sheet entitled “Simulated data & fitting values” contains the 10 simulated data sets that were generated using a standard procedure of random number generator. The predicted values obtained by the new three-parameter and conventional four-parameter logistic models were also appeared in this sheet. (XLSX)

  7. Random number generators minimum technical requirements - Dataset -...

    • publications.qld.gov.au
    Updated Jun 19, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    www.publications.qld.gov.au (2014). Random number generators minimum technical requirements - Dataset - Publications | Queensland Government [Dataset]. https://www.publications.qld.gov.au/dataset/random-number-generators-minimum-technical-requirements
    Explore at:
    Dataset updated
    Jun 19, 2014
    Dataset provided by
    Queensland Governmenthttp://qld.gov.au/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Queensland, Queensland Government
    Description

    Read this document to understand the minimum technical requirements for random number generators (RNGs) used in gaming-related equipment and systems.

  8. 1k Random 3D Shapes

    • kaggle.com
    Updated May 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    makra (2023). 1k Random 3D Shapes [Dataset]. https://www.kaggle.com/datasets/makra2077/1000-random-3d-shapes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 17, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    makra
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    About

    The dataset contains 1000 images with a random shape (of 17 possible shapes). This dataset is generated using the 3D Shapes Dataset Generator I've developed. Feel free to use it from here.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15136143%2F434f4faa08f5f66033feca35f6c682f3%2Flogo_op_spidey.ico?generation=1684269448893545&alt=media" alt="">

    Label

    Column NameInfo
    filenameName of the image file
    shapeShape Index
    operationOperation Index
    a,b,c,d,e,f,g,h,i,j,k,lDimensional parameters
    hue, sat, valHSV Values of the color
    rot_x, rot_y, rot_zEuler Angles
    pos_x, pos_y, pos_zPosition Vector

    Each row depicts information about a shape in the image of a dataset.

    Seed The seed value of the dataset is stored in a txt file and can be used to re-generate the dataset using the tool.

  9. Z

    Dataset Artifact for paper "Root Cause Analysis for Microservice System...

    • data.niaid.nih.gov
    Updated Aug 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhang, Hongyu (2024). Dataset Artifact for paper "Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13305662
    Explore at:
    Dataset updated
    Aug 25, 2024
    Dataset provided by
    Zhang, Hongyu
    Ha, Huong
    Pham, Luan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Artifacts for the paper titled Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?.

    This artifact repository contains 9 compressed folders, as follows:

    ID File Name Description

    1 syn_circa.zip CIRCA10, and CIRCA50 datasets for Causal Discovery

    2 syn_rcd.zip RCD10, and RCD50 datasets for Causal Discovery

    3 syn_causil.zip CausIL10, and CausIL50 datasets for Causal Discovery

    4 rca_circa.zip CIRCA10, and CIRCA50 datasets for RCA

    5 rca_rcd.zip RCD10, and RCD50 datasets for RCA

    6 online-boutique.zip Online Boutique dataset for RCA

    7 sock-shop-1.zip Sock Shop 1 dataset for RCA

    8 sock-shop-2.zip Sock Shop 2 dataset for RCA

    9 train-ticket.zip Train Ticket dataset for RCA

    Each zip file contains the generated/collected data from the corresponding data generator or microservice benchmark systems (e.g., online-boutique.zip contains metrics data collected from the Online Boutique system).

    Details about the generation of our datasets

    1. Synthetic datasets

    We use three different synthetic data generators from three previous RCA studies [15, 25, 28] to create the synthetic datasets: CIRCA, RCD, and CausIL data generators. Their mechanisms are as follows:1. CIRCA datagenerator [28] generates a random causal directed acyclic graph (DAG) based on a given number of nodes and edges. From this DAG, time series data for each node is generated using a vector auto-regression (VAR) model. A fault is injected into a node by altering the noise term in the VAR model for two timestamps. 2. RCD data generator [25] uses the pyAgrum package [3] to generate a random DAG based on a given number of nodes, subsequently generating discrete time series data for each node, with values ranging from 0 to 5. A fault is introduced into a node by changing its conditional probability distribution.3. CausIL data generator [15] generates causal graphs and time series data that simulate the behavior of microservice systems. It first constructs a DAG of services and metrics based on domain knowledge, then generates metric data for each node of the DAG using regressors trained on real metrics data. Unlike the CIRCA and RCD data generators, the CausIL data generator does not have the capability to inject faults.To create our synthetic datasets, we first generate 10 DAGs whose nodes range from 10 to 50 for each of the synthetic data generators. Next, we generate fault-free datasets using these DAGs with different seedings, resulting in 100 cases for the CIRCA and RCD generators and 10 cases for the CausIL generator. We then create faulty datasets by introducing ten faults into each DAG and generating the corresponding faulty data, yielding 100 cases for the CIRCA and RCD data generators. The fault-free datasets (e.g. syn_rcd, syn_circa) are used to evaluate causal discovery methods, while the faulty datasets (e.g. rca_rcd, rca_circa) are used to assess RCA methods.

    1. Data collected from benchmark microservice systems

    We deploy three popular benchmark microservice systems: Sock Shop [6], Online Boutique [4], and Train Ticket [8], on a four-node Kubernetes cluster hosted by AWS. Next, we use the Istio service mesh [2] with Prometheus [5] and cAdvisor [1] to monitor and collect resource-level and service-level metrics of all services, as in previous works [ 25 , 39, 59 ]. To generate traffic, we use the load generators provided by these systems and customise them to explore all services with 100 to 200 users concurrently. We then introduce five common faults (CPU hog, memory leak, disk IO stress, network delay, and packet loss) into five different services within each system. Finally, we collect metrics data before and after the fault injection operation. An overview of our setup is presented in the Figure below.

    Code

    The code to reproduce the experimental results in the paper is available at https://github.com/phamquiluan/RCAEval.

    References

    As in our paper.

  10. h

    DynaMath_Sample

    • huggingface.co
    Updated Oct 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DynaMath Team (2024). DynaMath_Sample [Dataset]. https://huggingface.co/datasets/DynaMath/DynaMath_Sample
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 31, 2024
    Dataset authored and provided by
    DynaMath Team
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for DynaMath

    [💻 Github] [🌐 Homepage][📖 Preprint Paper]

      Dataset Details
    
    
    
    
    
      🔈 Notice
    

    DynaMath is a dynamic benchmark with 501 seed question generators. This dataset is only a sample of 10 variants generated by DynaMath. We encourage you to use the dataset generator on our github site to generate random datasets to test.

      🌟 About DynaMath
    

    The rapid advancements in Vision-Language Models (VLMs) have shown significant potential in tackling… See the full description on the dataset page: https://huggingface.co/datasets/DynaMath/DynaMath_Sample.

  11. Dataset for Accessing Cosmic Radiation as an Entropy Source for a...

    • zenodo.org
    • data.niaid.nih.gov
    bin, text/x-python
    Updated May 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefan Kutschera; Stefan Kutschera; Wolfgang Slany; Wolfgang Slany; Patrick Ratschiller; Patrick Ratschiller; Sarina Gursch; Sarina Gursch; Håvard Dagenborg; Håvard Dagenborg (2023). Dataset for Accessing Cosmic Radiation as an Entropy Source for a Non-Deterministic Random Number Generator [Dataset]. http://doi.org/10.5281/zenodo.7774330
    Explore at:
    bin, text/x-pythonAvailable download formats
    Dataset updated
    May 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Stefan Kutschera; Stefan Kutschera; Wolfgang Slany; Wolfgang Slany; Patrick Ratschiller; Patrick Ratschiller; Sarina Gursch; Sarina Gursch; Håvard Dagenborg; Håvard Dagenborg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains all gathered data from the experiment from Wednesday, March 16, 2022 11:58:41.929 AM UTC+0 (1647431921929) until Sunday, April 3, 2022 1:08:35.353 PM UTC+0 (1648991315353). The experiment was executed during physical presence within the Arctic Circle in Tromsø, Norway 69° 40' 53.117'' N 18° 58' 36.027'' E at 35m elevation above sea level. The dataset was gathered with a prototype [1] based on the CREDO android application [2]. The main research is to use Ultra High Energy Cosmic Rays (UHECR) as an entropy source for a Random Bit Generator (RBG).

    The associated publication will probably have the title "Accessing Cosmic Radiation as an Entropy Source for a Non-Deterministic Random Number Generator"

    In order to reproduce the results the SQLite3 database "mrng_arctic_experiment_2022.db" is needed. To get the visual representations of the detections use "image_decoding_and_codesnippets.py" to generate the cleaned (414 detections / ~15MB) or the uncleaned (5567 detections / ~195 MB) dataset. The compressed folder "raw_data_incl_space_weather.7z" contains all raw data as gathered with the MRNG prototype, unprocessed, uncleaned, and unmerged.

    [1] https://github.com/StefanKutschera/mrng-prototype, visited on 27.03.2023

    [2] https://github.com/credo-science/credo-detector-android, visited on 27.03.2023

  12. h

    dummy_health_data

    • huggingface.co
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mudumbai Vraja Kishore (2025). dummy_health_data [Dataset]. https://huggingface.co/datasets/vrajakishore/dummy_health_data
    Explore at:
    Dataset updated
    May 29, 2025
    Authors
    Mudumbai Vraja Kishore
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Synthetic Healthcare Dataset

      Overview
    

    This dataset is a synthetic healthcare dataset created for use in data analysis. It mimics real-world patient healthcare data and is intended for applications within the healthcare industry.

      Data Generation
    

    The data has been generated using the Faker Python library, which produces randomized and synthetic records that resemble real-world data patterns. It includes various healthcare-related fields such as patient… See the full description on the dataset page: https://huggingface.co/datasets/vrajakishore/dummy_health_data.

  13. Statistical testing result of accelerometer data processed for random number...

    • figshare.com
    zip
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S Lee Hong; Chang Liu (2016). Statistical testing result of accelerometer data processed for random number generator seeding [Dataset]. http://doi.org/10.6084/m9.figshare.1273869.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    S Lee Hong; Chang Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set contains the result of applying the NIST Statistical Test Suite on accelerometer data processed for random number generator seeding. The NIST Statistical Test Suite can be downloaded from: http://csrc.nist.gov/groups/ST/toolkit/rng/documentation_software.html. The format of the output is explained in http://csrc.nist.gov/publications/nistpubs/800-22-rev1a/SP800-22rev1a.pdf.

  14. b

    Generation of random numbers by measuring on a silicon-on-insulator chip...

    • data.bris.ac.uk
    Updated Jun 14, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Generation of random numbers by measuring on a silicon-on-insulator chip phase fluctuations from a laser diode - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/2nvzdjr7gy4ox2njh7a6tbwks6
    Explore at:
    Dataset updated
    Jun 14, 2018
    Description

    Underpinning data for manuscript entitled "Generation of random numbers by measuring on a silicon-on-insulator chip phase fluctuations from a laser diode"

  15. h

    generated-usa-passeports-dataset

    • huggingface.co
    Updated Jul 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2023). generated-usa-passeports-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/generated-usa-passeports-dataset
    Explore at:
    Dataset updated
    Jul 15, 2023
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Data generation in machine learning involves creating or manipulating data to train and evaluate machine learning models. The purpose of data generation is to provide diverse and representative examples that cover a wide range of scenarios, ensuring the model's robustness and generalization. Data augmentation techniques involve applying various transformations to existing data samples to create new ones. These transformations include: random rotations, translations, scaling, flips, and more. Augmentation helps in increasing the dataset size, introducing natural variations, and improving model performance by making it more invariant to specific transformations. The dataset contains GENERATED USA passports, which are replicas of official passports but with randomly generated details, such as name, date of birth etc. The primary intention of generating these fake passports is to demonstrate the structure and content of a typical passport document and to train the neural network to identify this type of document. Generated passports can assist in conducting research without accessing or compromising real user data that is often sensitive and subject to privacy regulations. Synthetic data generation allows researchers to develop and refine models using simulated passport data without risking privacy leaks.

  16. S

    Dataset of Membership Inference Attack Defense Strategies

    • scidb.cn
    Updated Jan 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yongzhi Cao (2024). Dataset of Membership Inference Attack Defense Strategies [Dataset]. http://doi.org/10.57760/sciencedb.nbsdc.00098
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 15, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Yongzhi Cao
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The random number generation algorithm used within the consensus mechanism of blockchain systems may be plagued by member inference attacks, resulting in the inference of the algorithm's features or patterns of generated random numbers. Based on this issue, the research group proposed a resistance scheme based on knowledge distillation to ensure the security of the random number generation algorithm. The research group used a member inference attack defense strategy dataset to evaluate the performance of our proposed defense scheme, which includes 5 batch training datasets and 1 test dataset. By analyzing the performance changes of machine learning models after being subjected to member inference attacks on this dataset, evaluate the performance of member inference attack defense strategies. Collection plan: The folder name of the test dataset is "Member Reasoning Attack Resistance Strategy Dataset/cifar-10 patches py". CIFAR-10 is a small dataset used to identify ubiquitous objects, which can be accessed through the following link http://www.cs.toronto.edu/ ~Kriz/cifar. HTML download. Contains 10 categories of RGB color images. Each image has a size of 32 × 32. Each category has 6000 images, and there are a total of 50000 training images and 10000 test images in the dataset. Time and location: This dataset is test data collected by the research unit "Peking University" during 2021. Equipment situation: Data collection is processed in the following environment: hardware environment: supports general computing platforms such as Intel and ARM; System environment: Windows 11 and Ubuntu 20.04.

  17. Z

    TRAVEL: A Dataset with Toolchains for Test Generation and Regression Testing...

    • data.niaid.nih.gov
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alessio Gambi (2024). TRAVEL: A Dataset with Toolchains for Test Generation and Regression Testing of Self-driving Cars Software [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5911160
    Explore at:
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Annibale Panichella
    Sebastiano Panichella
    Alessio Gambi
    Christian Birchler
    Pouria Derakhshanfar
    Vincenzo Riccio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    This repository hosts the Testing Roads for Autonomous VEhicLes (TRAVEL) dataset. TRAVEL is an extensive collection of virtual roads that have been used for testing lane assist/keeping systems (i.e., driving agents) and data from their execution in state of the art, physically accurate driving simulator, called BeamNG.tech. Virtual roads consist of sequences of road points interpolated using Cubic splines.

    Along with the data, this repository contains instructions on how to install the tooling necessary to generate new data (i.e., test cases) and analyze them in the context of test regression. We focus on test selection and test prioritization, given their importance for developing high-quality software following the DevOps paradigms.

    This dataset builds on top of our previous work in this area, including work on

    test generation (e.g., AsFault, DeepJanus, and DeepHyperion) and the SBST CPS tool competition (SBST2021),

    test selection: SDC-Scissor and related tool

    test prioritization: automated test cases prioritization work for SDCs.

    Dataset Overview

    The TRAVEL dataset is available under the data folder and is organized as a set of experiments folders. Each of these folders is generated by running the test-generator (see below) and contains the configuration used for generating the data (experiment_description.csv), various statistics on generated tests (generation_stats.csv) and found faults (oob_stats.csv). Additionally, the folders contain the raw test cases generated and executed during each experiment (test..json).

    The following sections describe what each of those files contains.

    Experiment Description

    The experiment_description.csv contains the settings used to generate the data, including:

    Time budget. The overall generation budget in hours. This budget includes both the time to generate and execute the tests as driving simulations.

    The size of the map. The size of the squared map defines the boundaries inside which the virtual roads develop in meters.

    The test subject. The driving agent that implements the lane-keeping system under test. The TRAVEL dataset contains data generated testing the BeamNG.AI and the end-to-end Dave2 systems.

    The test generator. The algorithm that generated the test cases. The TRAVEL dataset contains data obtained using various algorithms, ranging from naive and advanced random generators to complex evolutionary algorithms, for generating tests.

    The speed limit. The maximum speed at which the driving agent under test can travel.

    Out of Bound (OOB) tolerance. The test cases' oracle that defines the tolerable amount of the ego-car that can lie outside the lane boundaries. This parameter ranges between 0.0 and 1.0. In the former case, a test failure triggers as soon as any part of the ego-vehicle goes out of the lane boundary; in the latter case, a test failure triggers only if the entire body of the ego-car falls outside the lane.

    Experiment Statistics

    The generation_stats.csv contains statistics about the test generation, including:

    Total number of generated tests. The number of tests generated during an experiment. This number is broken down into the number of valid tests and invalid tests. Valid tests contain virtual roads that do not self-intersect and contain turns that are not too sharp.

    Test outcome. The test outcome contains the number of passed tests, failed tests, and test in error. Passed and failed tests are defined by the OOB Tolerance and an additional (implicit) oracle that checks whether the ego-car is moving or standing. Tests that did not pass because of other errors (e.g., the simulator crashed) are reported in a separated category.

    The TRAVEL dataset also contains statistics about the failed tests, including the overall number of failed tests (total oob) and its breakdown into OOB that happened while driving left or right. Further statistics about the diversity (i.e., sparseness) of the failures are also reported.

    Test Cases and Executions

    Each test..json contains information about a test case and, if the test case is valid, the data observed during its execution as driving simulation.

    The data about the test case definition include:

    The road points. The list of points in a 2D space that identifies the center of the virtual road, and their interpolation using cubic splines (interpolated_points)

    The test ID. The unique identifier of the test in the experiment.

    Validity flag and explanation. A flag that indicates whether the test is valid or not, and a brief message describing why the test is not considered valid (e.g., the road contains sharp turns or the road self intersects)

    The test data are organized according to the following JSON Schema and can be interpreted as RoadTest objects provided by the tests_generation.py module.

    { "type": "object", "properties": { "id": { "type": "integer" }, "is_valid": { "type": "boolean" }, "validation_message": { "type": "string" }, "road_points": { §\label{line:road-points}§ "type": "array", "items": { "$ref": "schemas/pair" }, }, "interpolated_points": { §\label{line:interpolated-points}§ "type": "array", "items": { "$ref": "schemas/pair" }, }, "test_outcome": { "type": "string" }, §\label{line:test-outcome}§ "description": { "type": "string" }, "execution_data": { "type": "array", "items": { "$ref" : "schemas/simulationdata" } } }, "required": [ "id", "is_valid", "validation_message", "road_points", "interpolated_points" ] }

    Finally, the execution data contain a list of timestamped state information recorded by the driving simulation. State information is collected at constant frequency and includes absolute position, rotation, and velocity of the ego-car, its speed in Km/h, and control inputs from the driving agent (steering, throttle, and braking). Additionally, execution data contain OOB-related data, such as the lateral distance between the car and the lane center and the OOB percentage (i.e., how much the car is outside the lane).

    The simulation data adhere to the following (simplified) JSON Schema and can be interpreted as Python objects using the simulation_data.py module.

    { "$id": "schemas/simulationdata", "type": "object", "properties": { "timer" : { "type": "number" }, "pos" : { "type": "array", "items":{ "$ref" : "schemas/triple" } } "vel" : { "type": "array", "items":{ "$ref" : "schemas/triple" } } "vel_kmh" : { "type": "number" }, "steering" : { "type": "number" }, "brake" : { "type": "number" }, "throttle" : { "type": "number" }, "is_oob" : { "type": "number" }, "oob_percentage" : { "type": "number" } §\label{line:oob-percentage}§ }, "required": [ "timer", "pos", "vel", "vel_kmh", "steering", "brake", "throttle", "is_oob", "oob_percentage" ] }

    Dataset Content

    The TRAVEL dataset is a lively initiative so the content of the dataset is subject to change. Currently, the dataset contains the data collected during the SBST CPS tool competition, and data collected in the context of our recent work on test selection (SDC-Scissor work and tool) and test prioritization (automated test cases prioritization work for SDCs).

    SBST CPS Tool Competition Data

    The data collected during the SBST CPS tool competition are stored inside data/competition.tar.gz. The file contains the test cases generated by Deeper, Frenetic, AdaFrenetic, and Swat, the open-source test generators submitted to the competition and executed against BeamNG.AI with an aggression factor of 0.7 (i.e., conservative driver).

        Name
        Map Size (m x m)
        Max Speed (Km/h)
        Budget (h)
        OOB Tolerance (%)
        Test Subject
    
    
    
    
        DEFAULT
        200 × 200
        120
        5 (real time)
        0.95
        BeamNG.AI - 0.7
    
    
        SBST
        200 × 200
        70
        2 (real time)
        0.5
        BeamNG.AI - 0.7
    

    Specifically, the TRAVEL dataset contains 8 repetitions for each of the above configurations for each test generator totaling 64 experiments.

    SDC Scissor

    With SDC-Scissor we collected data based on the Frenetic test generator. The data is stored inside data/sdc-scissor.tar.gz. The following table summarizes the used parameters.

        Name
        Map Size (m x m)
        Max Speed (Km/h)
        Budget (h)
        OOB Tolerance (%)
        Test Subject
    
    
    
    
        SDC-SCISSOR
        200 × 200
        120
        16 (real time)
        0.5
        BeamNG.AI - 1.5
    

    The dataset contains 9 experiments with the above configuration. For generating your own data with SDC-Scissor follow the instructions in its repository.

    Dataset Statistics

    Here is an overview of the TRAVEL dataset: generated tests, executed tests, and faults found by all the test generators grouped by experiment configuration. Some 25,845 test cases are generated by running 4 test generators 8 times in 2 configurations using the SBST CPS Tool Competition code pipeline (SBST in the table). We ran the test generators for 5 hours, allowing the ego-car a generous speed limit (120 Km/h) and defining a high OOB tolerance (i.e., 0.95), and we also ran the test generators using a smaller generation budget (i.e., 2 hours) and speed limit (i.e., 70 Km/h) while setting the OOB tolerance to a lower value (i.e., 0.85). We also collected some 5, 971 additional tests with SDC-Scissor (SDC-Scissor in the table) by running it 9 times for 16 hours using Frenetic as a test generator and defining a more realistic OOB tolerance (i.e., 0.50).

    Generating new Data

    Generating new data, i.e., test cases, can be done using the SBST CPS Tool Competition pipeline and the driving simulator BeamNG.tech.

    Extensive instructions on how to install both software are reported inside the SBST CPS Tool Competition pipeline Documentation;

  18. h

    french_last_names_insee_2024

    • huggingface.co
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ronan L.M. (2024). french_last_names_insee_2024 [Dataset]. http://doi.org/10.57967/hf/3430
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 4, 2024
    Authors
    Ronan L.M.
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    French
    Description

    French Last Names from Death Records (1970-2024)

    This dataset contains French lasst names extracted from death records provided by INSEE (French National Institute of Statistics and Economic Studies) covering the period from 1970 to September 2024.

      Dataset Description
    
    
    
    
    
      Random name generator demo
    

    go to https://sctg-development.github.io/french-names-extractor/

      Data Source
    

    The data is sourced from INSEE's death records database. It includes last names… See the full description on the dataset page: https://huggingface.co/datasets/eltorio/french_last_names_insee_2024.

  19. e

    Knowledge Graph Generator - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Aug 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Knowledge Graph Generator - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/83b50938-a7f7-5b3e-888b-8b64698e2df6
    Explore at:
    Dataset updated
    Aug 1, 2025
    Description

    Code and experiment results for a synthetic knowledge graph generator. The generator receives a set of rules, with an expected body support and support, and returns a knowledge graph that approximately matches the rules according to the body support and confidence. This code was developed during the Bachelor thesis by Gabriel Glaser, Generating Random Knowledge Graphs from Rules, University of Stuttgart, 2024. Handle 11682/15486.

  20. w

    Synthetic Data for an Imaginary Country, Sample, 2023 - World

    • microdata.worldbank.org
    • nada-demo.ihsn.org
    Updated Jul 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
    Explore at:
    Dataset updated
    Jul 7, 2023
    Dataset authored and provided by
    Development Data Group, Data Analytics Unit
    Time period covered
    2023
    Area covered
    World, World
    Description

    Abstract

    The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

    The full-population dataset (with about 10 million individuals) is also distributed as open data.

    Geographic coverage

    The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

    Analysis unit

    Household, Individual

    Universe

    The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

    Kind of data

    ssd

    Sampling procedure

    The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

    Mode of data collection

    other

    Research instrument

    The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

    Cleaning operations

    The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

    Response rate

    This is a synthetic dataset; the "response rate" is 100%.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Alexander Outman (2023). Random Numbers [Dataset]. https://ieee-dataport.org/documents/random-numbers

Random Numbers

Explore at:
Dataset updated
Mar 14, 2023
Authors
Alexander Outman
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset includes random number generated through various methods.Method 1: shuf https://www.mankier.com/1/shufCommands used to generate dataset files: $ shuf -i 1-1000000000 -n1000000 -o random-shuf.txt$ shuf -i 1-1000000000000 -n1000000 -o random-shuf-1-1000000000000.txt$ jot -r 1000000 1 1000000000000 > random-jot-1-1000000000000.txt

Search
Clear search
Close search
Google apps
Main menu