Dataset Card for test-data-generator
This dataset has been created with distilabel.
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/franciscoflorencio/test-data-generator/raw/main/pipeline.yaml"
or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/franciscoflorencio/test-data-generator.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset used in the article entitled 'Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools'. These datasets can be used to test several characteristics in machine learning and data processing algorithms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository includes MATLAB files and datasets related to the IEEE IIRW 2023 conference proceeding:T. Zanotti et al., "Reliability Analysis of Random Telegraph Noisebased True Random Number Generators," 2023 IEEE International Integrated Reliability Workshop (IIRW), South Lake Tahoe, CA, USA, 2023, pp. 1-6, doi: 10.1109/IIRW59383.2023.10477697
The repository includes:
The data of the bitmaps reported in Fig. 4, i.e., the results of the simulation of the ideal RTN-based TRNG circuit for different reseeding strategies. To load and plot the data use the "plot_bitmaps.mat" file.
The result of the circuit simulations considering the EvolvingRTN from the HfO2 device shown in Fig. 7, for two Rgain values. Specifically, the data is contained in the following csv files:
"Sim_TRNG_Circuit_HfO2_3_20s_Vth_210m_no_Noise_Ibias_11n.csv" (lower Rgain)
"Sim_TRNG_Circuit_HfO2_3_20s_Vth_210m_no_Noise_Ibias_4_8n.csv" (higher Rgain)
The result of the circuit simulations considering the temporary RTN from the SiO2 device shown in Fig. 8. Specifically, the data is contained in the following csv files:
"Sim_TRNG_Circuit_SiO2_1c_300s_Vth_180m_Noise_Ibias_1.5n.csv" (ref. Rgain)
"Sim_TRNG_Circuit_SiO2_1c_100s_200s_Vth_180m_Noise_Ibias_1.575n.csv" (lower Rgain)
"Sim_TRNG_Circuit_SiO2_1c_100s_200s_Vth_180m_Noise_Ibias_1.425n.csv" (higher Rgain)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Excel sheets in order: The sheet entitled “Hens Original Data” contains the results of an experiment conducted to study the response of laying hens during initial phase of egg production subjected to different intakes of dietary threonine. The sheet entitled “Simulated data & fitting values” contains the 10 simulated data sets that were generated using a standard procedure of random number generator. The predicted values obtained by the new three-parameter and conventional four-parameter logistic models were also appeared in this sheet. (XLSX)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Testing web APIs automatically requires generating input data values such as addressess, coordinates or country codes. Generating meaningful values for these types of parameters randomly is rarely feasible, which means a major obstacle for current test case generation approaches. In this paper, we present ARTE, the first semantic-based approach for the Automated generation of Realistic TEst inputs for web APIs. Specifically, ARTE leverages the specification of the API under test to extract semantically related values for every parameter by applying knowledge extraction techniques. Our approach has been integrated into RESTest, a state-of-the-art tool for API testing, achieving an unprecedented level of automation which allows to generate up to 100\% more valid API calls than existing fuzzing techniques (30\% on average). Evaluation results on a set of 26 real-world APIs show that ARTE can generate realistic inputs for 7 out of every 10 parameters, outperforming the results obtained by related approaches.
Static torque, no load, constant speed, and sinusoidal oscillation test data for a 10hp, 300rpm magnetically-geared generator prototype using either an adjustable load bank for a fixed resistance or an output power converter.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset contains temperature and humidity sensor readings of a particular day, which are synthetically generated using a data generator and are stored as JSON files to validate and test (performance/load testing) the data pipeline components.
Static torque and no load test data for a 1hp, 300rpm axial-flux magnetically geared generator prototype developed by Texas A&M EMPE Lab.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Social networks are tied to population dynamics; interactions are driven by population density and demographic structure, while social relationships can be key determinants of survival and reproductive success. However, difficulties integrating models used in demography and network analysis have limited research at this interface. We introduce the R package genNetDem for simulating integrated network-demographic datasets. It can be used to create longitudinal social networks and/or capture-recapture datasets with known properties. It incorporates the ability to generate populations and their social networks, generate grouping events using these networks, simulate social network effects on individual survival, and flexibly sample these longitudinal datasets of social associations. By generating co-capture data with known statistical relationships it provides functionality for methodological research. We demonstrate its use with case studies testing how imputation and sampling design influence the success of adding network traits to conventional Cormack-Jolly-Seber (CJS) models. We show that incorporating social network effects in CJS models generates qualitatively accurate results, but with downward-biased parameter estimates when network position influences survival. Biases are greater when fewer interactions are sampled or fewer individuals are observed in each interaction. While our results indicate the potential of incorporating social effects within demographic models, they show that imputing missing network measures alone is insufficient to accurately estimate social effects on survival, pointing to the importance of incorporating network imputation approaches. genNetDem provides a flexible tool to aid these methodological advancements and help researchers test other sampling considerations in social network studies. Methods The dataset and code stored here is for Case Studies 1 and 2 in the paper. Datsets were generated using simulations in R. Here we provide 1) the R code used for the simulations; 2) the simulation outputs (as .RDS files); and 3) the R code to analyse simulation outputs and generate the tables and figures in the paper.
https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/
Abstract A library containing highly portable implementations of most algorithms for (pseudo) random number generation has been developed, which might be used in any area of simulation which requires random number generators. Each generator is freely configurable by the user, so the RANEXP library is particularly well-suited for applications requiring different random number generators. The algorithms are implemented in C, but are callable from Fortran application program also.
Title of program: RANEXP Catalogue Id: ACTB_v1_0
Nature of problem Any Monte Carlo simulation or statistical test requiring uniform pseudorandom numbers.
Versions of this program held in the CPC repository in Mendeley Data ACTB_v1_0; RANEXP; 10.1016/0010-4655(94)90072-8
This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Fake Email Address Generator Market Analysis The global market for Fake Email Address Generators is expected to reach a value of XXX million by 2033, growing at a CAGR of XX% from 2025 to 2033. Key drivers of this growth include the increasing demand for privacy and anonymity online, the growing prevalence of spam and phishing attacks, and the proliferation of digital marketing campaigns. Additionally, the adoption of cloud-based solutions and the emergence of new technologies, such as artificial intelligence (AI), are further fueling market expansion. Key trends in the Fake Email Address Generator market include the growing popularity of enterprise-grade solutions, the emergence of disposable email services, and the increasing integration with other online tools. Restraints to market growth include concerns over security and data protection, as well as the availability of free or low-cost alternatives. The market is dominated by a few major players, including Burnermail, TrashMail, and Guerrilla Mail, but a growing number of smaller vendors are emerging with innovative solutions. Geographically, North America and Europe are the largest markets, followed by the Asia Pacific region.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global Random Outfit Generator market was valued at USD 1.2 billion in 2025 and is projected to grow at a CAGR of 10.5% during the forecast period, reaching USD 2.5 billion by 2033. This growth is attributed to increasing demand for personalized and time-saving fashion solutions, rising disposable income, and growing awareness of fashion trends. The market is predominantly driven by fashion enthusiasts and professionals who seek efficient ways to generate unique and stylish outfits. Among the key segments of the Random Outfit Generator market are: Application: Fashion Designer, Fashion Enthusiasts, Photography Stylist, Others Types: Cloud-based, On-premises Geography: North America, South America, Europe, Middle East & Africa, Asia Pacific Key players in the market include Fashmates, Stylicious, Your Closet, Combyne, My Dressing, Acloset, My Wardrobe, Smart Closet, Pureple, Twelve70, Roll For Fantasy, Randommer, The Fashion Robot, and Picrew. These companies offer innovative solutions to cater to the growing demand for random outfit generators, engaging users with interactive features and personalized experiences.
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
According to Cognitive Market Research, the global Quantum Random Number Generator RNG market size is USD 555.9 million in 2024. It will expand at a compound annual growth rate (CAGR) of 72.60% from 2024 to 2031.
North America held the major market share for more than 40% of the global revenue with a market size of USD 222.36 million in 2024 and will grow at a compound annual growth rate (CAGR) of 70.8% from 2024 to 2031.
Europe accounted for a market share of over 30% of the global revenue with a market size of USD 166.77 million.
Asia Pacific held a market share of around 23% of the global revenue with a market size of USD 127.86 million in 2024 and will grow at a compound annual growth rate (CAGR) of 74.6% from 2024 to 2031.
Latin America had a market share for more than 5% of the global revenue with a market size of USD 27.80 million in 2024 and will grow at a compound annual growth rate (CAGR) of 72.0% from 2024 to 2031.
Middle East and Africa had a market share of around 2% of the global revenue and was estimated at a market size of USD 11.12 million in 2024 and will grow at a compound annual growth rate (CAGR) of 72.3% from 2024 to 2031.
Cloud held the dominant segment in the Quantum Random Number Generator RNG market in 2024.
Market Dynamics of Quantum Random Number Generator RNG Market
Key Drivers for Quantum Random Number Generator RNG Market
Increasing need for random numbers in cryptography or compute applications
The QRNG is an ideal random key generator since it generates entropy using intrinsic quantum physics properties. Nowadays, applications demand a huge number of keys and randomization to achieve total security. It could include key vaults, games, IoT devices, AI/ML, blockchains, simulations, and vital infrastructure. QRNG is the source of these applications in which trust in randomness is prevalent. Furthermore, it is utilized in encryption for a wide range of applications, including cryptography, numerical simulation, gambling, and game design.
Growing adoption of quantum computing
The increasing use of quantum computing is boosting the market for Quantum Random Number Generators (RNG) as it creates a need for improved random number generation capabilities. The accurate abilities of quantum computing enable RNGs to produce truly random numbers, essential for secure communication and encryption. Advancements in quantum computing will lead to a higher demand for dependable RNGs, driving market expansion to meet the changing requirements of cybersecurity and data encryption.
Restraint Factor for the Quantum Random Number Generator RNG Market
High initial investment
A significant initial investment hinders the Quantum Random Number Generator (RNG) Market, creating a barrier for new entrants and small companies looking to invest in RNG generation. The significant initial costs involved in the research, development, and deployment of quantum RNG solutions may discourage potential entrants from joining the market. This limitation impedes the growth of the market by limiting innovation and competition, potentially hindering progress in the era of RNG and constraining the market's growth
Impact of Covid-19 on the Quantum Random Number Generator RNG Market
The effect of COVID-19 on the Quantum Random Number Generator RNG Market was merged. Although the pandemic initially caused disruptions in supply chains and slowed down certain trends, the increased focus on cybersecurity and data protection during remote work and digital interactions enhanced the need for secure communication solutions such as quantum RNGs. With a focus on safeguarding information, both organizations and governments fueled growth in the Quantum RNG market despite pandemic-related obstacles. Introduction of the Quantum Random Number Generator RNG Market
The Quantum Random Number Generator (QRNG) is a highly sophisticated engineering innovation that combines the power of complex deep-tech technologies like semiconductors, optoelectronics, high-precision electronics, and quantum physics to achieve the highest level of randomness possible. QRNG has shown to be a critical enabling technology for quantum-level security in mobile devices, data centres, and medical implants. They provide consumers with a significant enhancement over ordinary random number generators (RNGs), which have been utilized for years in a variety of business applications. Several factors, including th...
The Space-based Imaging Spectroscopy and Thermal pathfindER (SISTER) activity originated in support of the NASA Earth System Observatory's Surface Biology and Geology (SBG) mission to develop prototype workflows with community algorithms and generate prototype data products envisioned for SBG. SISTER focused on developing a data system that is open, portable, scalable, standards-compliant, and reproducible. This collection contains EXPERIMENTAL workflows and sample data products, including (a) the Common Workflow Language (CWL) process file and a Jupyter Notebook that run the entire SISTER workflow capable of generating experimental sample data products spanning terrestrial ecosystems, inland and coastal aquatic ecosystems, and snow, (b) the archived algorithm steps (as OGC Application Packages) used to generate products at each step of the workflow, (c) a small number of experimental sample data products produced by the workflow which are based on the Airborne Visible/Infrared Imaging Spectrometer-Classic (AVIRIS or AVIRIS-CL) instrument, and (d) instructions for reproducing the sample products included in this dataset. DISCLAIMER: This collection contains experimental workflows, experimental community algorithms, and experimental sample data products to demonstrate the capabilities of an end-to-end processing system. The experimental sample data products provided have not been fully validated and are not intended for scientific use. The community algorithms provided are placeholders which can be replaced by any user's algorithms for their own science and application interests. These algorithms should not in any capacity be considered the algorithms that will be implemented in the upcoming Surface Biology and Geology mission.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.
NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.
Datasets
The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).
The datasets contain both benign and malicious traffic. All collected datasets are balanced.
The version of NetFlow used to build the datasets is 5.
Dataset
Aim
Samples
Benign-malicious
traffic ratio
D1
Training
400,003
50%
D2
Test
57,239
50%
Infrastructure and implementation
Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.
DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)
Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).
The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.
The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.
Parameters
Description
'--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema'
Enumerate users, password hashes, privileges, roles, databases, tables and columns
--level=5
Increase the probability of a false positive identification
--risk=3
Increase the probability of extracting data
--random-agent
Select the User-Agent randomly
--batch
Never ask for user input, use the default behavior
--answers="follow=Y"
Predefined answers to yes
Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).
The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24. The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.
However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.
To run the MySQL server we ran MariaDB version 10.4.12. Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Machine learning approaches promise to accelerate and improve success rates in medicinal chemistry programs by more effectively leveraging available data to guide a molecular design. A key step of an automated computational design algorithm is molecule generation, where the machine is required to design high-quality, drug-like molecules within the appropriate chemical space. Many algorithms have been proposed for molecular generation; however, a challenge is how to assess the validity of the resulting molecules. Here, we report three Turing-inspired tests designed to evaluate the performance of molecular generators. Profound differences were observed between the performance of molecule generators in these tests, highlighting the importance of selection of the appropriate design algorithms for specific circumstances. One molecule generator, based on match molecular pairs, performed excellently against all tests and thus provides a valuable component for machine-driven medicinal chemistry design workflows.
This software tool generates simulated radar signals and creates RF datasets. The datasets can be used to develop and test detection algorithms by utilizing machine learning/deep learning techniques for the 3.5 GHz Citizens Broadband Radio Service (CBRS) or similar bands. In these bands, the primary users of the band are federal incumbent radar systems. The software tool generates radar waveforms and randomizes the radar waveform parameters. The pulse modulation types for the radar signals and their parameters are selected based on NTIA testing procedures for ESC certification, available at http://www.its.bldrdoc.gov/publications/3184.aspx. Furthermore, the tool mixes the waveforms with interference and packages them into one RF dataset file. The tool utilizes a graphical user interface (GUI) to simplify the selection of parameters and the mixing process. A reference RF dataset was generated using this software. The RF dataset is published at https://doi.org/10.18434/M32116.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Insider Threat Test Dataset is a collection of synthetic insider threat test datasets that provide both background and malicious actor synthetic data.
The CERT Division, in partnership with ExactData, LLC, and under sponsorship from DARPA I2O, generated a collection of synthetic insider threat test datasets. These datasets provide both synthetic background data and data from synthetic malicious actors. For more background on this data, please see the paper, Bridging the Gap: A Pragmatic Approach to Generating Insider Threat Data. Datasets are organized according to the data generator release that created them. Most releases include multiple datasets (e.g., r3.1 and r3.2). Generally, later releases include a superset of the data generation functionality of earlier releases. Each dataset file contains a readme file that provides detailed notes about the features of that release. The answer key file answers.tar.bz2 contains the details of the malicious activity included in each dataset, including descriptions of the scenarios enacted and the identifiers of the synthetic users involved.
This dataset was created by Owen Tamuno Gilbert
Dataset Card for test-data-generator
This dataset has been created with distilabel.
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/franciscoflorencio/test-data-generator/raw/main/pipeline.yaml"
or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/franciscoflorencio/test-data-generator.