Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset used in the article entitled 'Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools'. These datasets can be used to test several characteristics in machine learning and data processing algorithms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
This repository hosts the Testing Roads for Autonomous VEhicLes (TRAVEL) dataset. TRAVEL is an extensive collection of virtual roads that have been used for testing lane assist/keeping systems (i.e., driving agents) and data from their execution in state of the art, physically accurate driving simulator, called BeamNG.tech. Virtual roads consist of sequences of road points interpolated using Cubic splines.
Along with the data, this repository contains instructions on how to install the tooling necessary to generate new data (i.e., test cases) and analyze them in the context of test regression. We focus on test selection and test prioritization, given their importance for developing high-quality software following the DevOps paradigms.
This dataset builds on top of our previous work in this area, including work on
test generation (e.g., AsFault, DeepJanus, and DeepHyperion) and the SBST CPS tool competition (SBST2021),
test selection: SDC-Scissor and related tool
test prioritization: automated test cases prioritization work for SDCs.
Dataset Overview
The TRAVEL dataset is available under the data folder and is organized as a set of experiments folders. Each of these folders is generated by running the test-generator (see below) and contains the configuration used for generating the data (experiment_description.csv), various statistics on generated tests (generation_stats.csv) and found faults (oob_stats.csv). Additionally, the folders contain the raw test cases generated and executed during each experiment (test..json).
The following sections describe what each of those files contains.
Experiment Description
The experiment_description.csv contains the settings used to generate the data, including:
Time budget. The overall generation budget in hours. This budget includes both the time to generate and execute the tests as driving simulations.
The size of the map. The size of the squared map defines the boundaries inside which the virtual roads develop in meters.
The test subject. The driving agent that implements the lane-keeping system under test. The TRAVEL dataset contains data generated testing the BeamNG.AI and the end-to-end Dave2 systems.
The test generator. The algorithm that generated the test cases. The TRAVEL dataset contains data obtained using various algorithms, ranging from naive and advanced random generators to complex evolutionary algorithms, for generating tests.
The speed limit. The maximum speed at which the driving agent under test can travel.
Out of Bound (OOB) tolerance. The test cases' oracle that defines the tolerable amount of the ego-car that can lie outside the lane boundaries. This parameter ranges between 0.0 and 1.0. In the former case, a test failure triggers as soon as any part of the ego-vehicle goes out of the lane boundary; in the latter case, a test failure triggers only if the entire body of the ego-car falls outside the lane.
Experiment Statistics
The generation_stats.csv contains statistics about the test generation, including:
Total number of generated tests. The number of tests generated during an experiment. This number is broken down into the number of valid tests and invalid tests. Valid tests contain virtual roads that do not self-intersect and contain turns that are not too sharp.
Test outcome. The test outcome contains the number of passed tests, failed tests, and test in error. Passed and failed tests are defined by the OOB Tolerance and an additional (implicit) oracle that checks whether the ego-car is moving or standing. Tests that did not pass because of other errors (e.g., the simulator crashed) are reported in a separated category.
The TRAVEL dataset also contains statistics about the failed tests, including the overall number of failed tests (total oob) and its breakdown into OOB that happened while driving left or right. Further statistics about the diversity (i.e., sparseness) of the failures are also reported.
Test Cases and Executions
Each test..json contains information about a test case and, if the test case is valid, the data observed during its execution as driving simulation.
The data about the test case definition include:
The road points. The list of points in a 2D space that identifies the center of the virtual road, and their interpolation using cubic splines (interpolated_points)
The test ID. The unique identifier of the test in the experiment.
Validity flag and explanation. A flag that indicates whether the test is valid or not, and a brief message describing why the test is not considered valid (e.g., the road contains sharp turns or the road self intersects)
The test data are organized according to the following JSON Schema and can be interpreted as RoadTest objects provided by the tests_generation.py module.
{ "type": "object", "properties": { "id": { "type": "integer" }, "is_valid": { "type": "boolean" }, "validation_message": { "type": "string" }, "road_points": { §\label{line:road-points}§ "type": "array", "items": { "$ref": "schemas/pair" }, }, "interpolated_points": { §\label{line:interpolated-points}§ "type": "array", "items": { "$ref": "schemas/pair" }, }, "test_outcome": { "type": "string" }, §\label{line:test-outcome}§ "description": { "type": "string" }, "execution_data": { "type": "array", "items": { "$ref" : "schemas/simulationdata" } } }, "required": [ "id", "is_valid", "validation_message", "road_points", "interpolated_points" ] }
Finally, the execution data contain a list of timestamped state information recorded by the driving simulation. State information is collected at constant frequency and includes absolute position, rotation, and velocity of the ego-car, its speed in Km/h, and control inputs from the driving agent (steering, throttle, and braking). Additionally, execution data contain OOB-related data, such as the lateral distance between the car and the lane center and the OOB percentage (i.e., how much the car is outside the lane).
The simulation data adhere to the following (simplified) JSON Schema and can be interpreted as Python objects using the simulation_data.py module.
{ "$id": "schemas/simulationdata", "type": "object", "properties": { "timer" : { "type": "number" }, "pos" : { "type": "array", "items":{ "$ref" : "schemas/triple" } } "vel" : { "type": "array", "items":{ "$ref" : "schemas/triple" } } "vel_kmh" : { "type": "number" }, "steering" : { "type": "number" }, "brake" : { "type": "number" }, "throttle" : { "type": "number" }, "is_oob" : { "type": "number" }, "oob_percentage" : { "type": "number" } §\label{line:oob-percentage}§ }, "required": [ "timer", "pos", "vel", "vel_kmh", "steering", "brake", "throttle", "is_oob", "oob_percentage" ] }
Dataset Content
The TRAVEL dataset is a lively initiative so the content of the dataset is subject to change. Currently, the dataset contains the data collected during the SBST CPS tool competition, and data collected in the context of our recent work on test selection (SDC-Scissor work and tool) and test prioritization (automated test cases prioritization work for SDCs).
SBST CPS Tool Competition Data
The data collected during the SBST CPS tool competition are stored inside data/competition.tar.gz. The file contains the test cases generated by Deeper, Frenetic, AdaFrenetic, and Swat, the open-source test generators submitted to the competition and executed against BeamNG.AI with an aggression factor of 0.7 (i.e., conservative driver).
Name
Map Size (m x m)
Max Speed (Km/h)
Budget (h)
OOB Tolerance (%)
Test Subject
DEFAULT
200 × 200
120
5 (real time)
0.95
BeamNG.AI - 0.7
SBST
200 × 200
70
2 (real time)
0.5
BeamNG.AI - 0.7
Specifically, the TRAVEL dataset contains 8 repetitions for each of the above configurations for each test generator totaling 64 experiments.
SDC Scissor
With SDC-Scissor we collected data based on the Frenetic test generator. The data is stored inside data/sdc-scissor.tar.gz. The following table summarizes the used parameters.
Name
Map Size (m x m)
Max Speed (Km/h)
Budget (h)
OOB Tolerance (%)
Test Subject
SDC-SCISSOR
200 × 200
120
16 (real time)
0.5
BeamNG.AI - 1.5
The dataset contains 9 experiments with the above configuration. For generating your own data with SDC-Scissor follow the instructions in its repository.
Dataset Statistics
Here is an overview of the TRAVEL dataset: generated tests, executed tests, and faults found by all the test generators grouped by experiment configuration. Some 25,845 test cases are generated by running 4 test generators 8 times in 2 configurations using the SBST CPS Tool Competition code pipeline (SBST in the table). We ran the test generators for 5 hours, allowing the ego-car a generous speed limit (120 Km/h) and defining a high OOB tolerance (i.e., 0.95), and we also ran the test generators using a smaller generation budget (i.e., 2 hours) and speed limit (i.e., 70 Km/h) while setting the OOB tolerance to a lower value (i.e., 0.85). We also collected some 5, 971 additional tests with SDC-Scissor (SDC-Scissor in the table) by running it 9 times for 16 hours using Frenetic as a test generator and defining a more realistic OOB tolerance (i.e., 0.50).
Generating new Data
Generating new data, i.e., test cases, can be done using the SBST CPS Tool Competition pipeline and the driving simulator BeamNG.tech.
Extensive instructions on how to install both software are reported inside the SBST CPS Tool Competition pipeline Documentation;
Dataset Card for test-data-generator
This dataset has been created with distilabel.
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/franciscoflorencio/test-data-generator/raw/main/pipeline.yaml"
or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/franciscoflorencio/test-data-generator.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Developing software test code can be as or more expensive than developing software production code. Commonly, developers use automated unit test generators to speed up software testing. The purpose of such tools is to shorten production time without decreasing code quality. Nonetheless, unit tests usually do not have a quality check layer above testing code, which might be hard to guarantee the quality of the generated tests. An emerging strategy to verify the tests quality is to analyze the presence of test smells in software test code. Test smells are characteristics in the test code that possibly indicate weaknesses in test design and implementation. The presence of test smells in unit test code could be used as an indicator of unit test quality. In this paper, we present an empirical study aimed to analyze the quality of unit test code generated by automated test tools. We compare the tests generated by two tools (Randoop and EvoSuite) with the existing unit test suite of open-source software projects. We analyze the unit test code of twenty-one open-source Java projects and detected the presence of nineteen types of test smells. The results indicated significant differences in the unit test quality when comparing data from both automated unit test generators and existing unit test suites.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Testing web APIs automatically requires generating input data values such as addressess, coordinates or country codes. Generating meaningful values for these types of parameters randomly is rarely feasible, which means a major obstacle for current test case generation approaches. In this paper, we present ARTE, the first semantic-based approach for the Automated generation of Realistic TEst inputs for web APIs. Specifically, ARTE leverages the specification of the API under test to extract semantically related values for every parameter by applying knowledge extraction techniques. Our approach has been integrated into RESTest, a state-of-the-art tool for API testing, achieving an unprecedented level of automation which allows to generate up to 100\% more valid API calls than existing fuzzing techniques (30\% on average). Evaluation results on a set of 26 real-world APIs show that ARTE can generate realistic inputs for 7 out of every 10 parameters, outperforming the results obtained by related approaches.
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
This is a program that takes in a description of a cryptographic algorithm implementation's capabilities, and generates test vectors to ensure the implementation conforms to the standard. After generating the test vectors, the program also validates the correctness of the responses from the user.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set contains the result of applying the NIST Statistical Test Suite on accelerometer data processed for random number generator seeding. The NIST Statistical Test Suite can be downloaded from: http://csrc.nist.gov/groups/ST/toolkit/rng/documentation_software.html. The format of the output is explained in http://csrc.nist.gov/publications/nistpubs/800-22-rev1a/SP800-22rev1a.pdf.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The code, strainenergy_v4_1.m, was used for generating and processing the dataset for load-displacement and stress-strain. Software Matlab version 6.1 was used for running the code. The specific variables of the parameters used to generate the current dataset are as follows:• ip1: input file containing the load-displacement data• diameter: fascicle diameter• laststrainpt: an estimate of the strain at rupture, r• orderpoly: an integral value from 2-7 which represents the order of the polynomial for fitting to the data from O to q• loadat1percent: y/n; to determine the value of the load (set at 1% of the maximum load) at which the specimen became taut. ‘y’ denotes yes; ‘n’ denotes no.The logfile.txt, contains the parameters used for deriving the values of the respective mechanical properties.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global generator test load bank market is experiencing robust growth, driven by increasing demand for reliable power generation and stringent testing regulations across various industries. The market size in 2025 is estimated at $1.5 billion, exhibiting a Compound Annual Growth Rate (CAGR) of 7% from 2025 to 2033. This growth is fueled by several key factors. The expansion of data centers, the rising adoption of renewable energy sources requiring rigorous testing, and the growing need for efficient power generation in industrial sectors like shipping and power plants are major contributors. Furthermore, advancements in load bank technology, including the development of more compact, efficient, and digitally controlled units, are enhancing market appeal. The adoption of resistive-reactive load banks, offering greater flexibility and accuracy in testing, is also driving market expansion. Regional growth is expected to be diverse, with North America and Asia-Pacific leading the charge due to strong economic growth and substantial investments in infrastructure. However, certain restraints exist. High initial investment costs associated with advanced load bank systems might hinder adoption, particularly among smaller enterprises. Additionally, fluctuations in raw material prices and the complexity of integrating these systems into existing infrastructure pose challenges. Nevertheless, ongoing technological improvements and increasing awareness of the crucial role of generator testing in ensuring power reliability are projected to mitigate these obstacles. The market segmentation reveals significant opportunities in various applications, notably data center generator testing and the growing renewable energy sector. Key players are focusing on product innovation, strategic partnerships, and expansion into new geographical markets to strengthen their market position and capitalize on this growth trajectory. The market is poised for continued expansion, with significant potential for growth across diverse geographical regions and application segments.
Static torque, no load, constant speed, and sinusoidal oscillation test data for a 10hp, 300rpm magnetically-geared generator prototype using either an adjustable load bank for a fixed resistance or an output power converter.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains 350 features engineered from the phasor measurements (PMU-type) signals from the IEEE New England 39-bus power system test case network, which are generated from the 9360 systematic MATLAB®/Simulink electro-mechanical transients simulations. It was prepared to serve as a convenient and open database for experimenting with different types of machine learning techniques for transient stability assessment (TSA) of electrical power systems.
Different load and generation levels of the New England 39-bus benchmark power system were systematically covered, as well as all three major types of short-circuit events (three-phase, two-phase and single-phase faults) in all parts of the network. The consumed power of the network was set to 80%, 90%, 100%, 110% and 120% of the basic system load levels. The short-circuits were located on the busbar or on the transmission line (TL). When they were located on a TL, it was assumed that they can occur at 20%, 40%, 60%, and 80% of the line length. Features were obtained directly from the time-domain signals at the pickup time (pre-fault value) and at the trip time (post-fault value) of the associated distance protection relays.
This is a stochastic dataset of 3120 cases, created from the population of 9360 systematic simulations, which features a statistical distribution of different fault types, as follows: single-phase (70%), double-phase (20%) and three-phase faults (10%). It also features a class imbalance, with less than 20% of cases belonging to the unstable class. Dataset is a compressed CSV file.
List of feature names in the dataset:
WmGx - rotor speed for each generator Gx, from G1 to G10,
DThetaGx - rotor angle deviation for each generator Gx, from G1 to G10,
ThetaGx - rotor mechanical angle for each generator Gx, from G1 to G10,
VtGx - stator voltage for each generator Gx, from G1 to G10,
IdGx - stator d-component current for each generator Gx, from G1 to G10,
IqGx - stator q-component current for each generator Gx, from G1 to G10,
LAfvGx - pre-fault power load angle for each generator Gx, from G1 to G10,
LAlvGx - post-fault power load angle for each generator Gx, from G1 to G10,
PfvGx - pre-falut value of the generator active power for each generator Gx, from G1 to G10,
PlvGx - post-falut value of the generator active power for each generator Gx, from G1 to G10,
QfvGx - pre-falut value of the generator reactive power for each generator Gx, from G1 to G10,
QlvGx - post-falut value of the generator reactive power for each generator Gx, from G1 to G10,
VAfvBx - pre-fault bus voltage magnitude in phase A for each bus Bx, from B1 to B39,
VBfvBx - pre-fault bus voltage magnitude in phase B for each bus Bx, from B1 to B39,
VCfvBx - pre-fault bus voltage magnitude in phase C for each bus Bx, from B1 to B39,
VAlvBx - post-fault bus voltage magnitude in phase A for each bus Bx, from B1 to B39,
VBlvBx - post-fault bus voltage magnitude in phase B for each bus Bx, from B1 to B39,
VClvBx - post-fault bus voltage magnitude in phase C for each bus Bx, from B1 to B39,
Stability - binary indicator (0/1) that determines if the power system was stable or unstable (0 - stable, 1 - unstable); this is the label variable.
License: Creative Commons CC-BY.
Disclaimer: This dataset is provided "as is", without any warranties of any kind.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The high-speed digital signal generator market is experiencing robust growth, driven by the increasing demand for high-bandwidth communication systems in sectors like 5G, data centers, and automotive. The market's expansion is fueled by the need for accurate and reliable signal generation for testing high-speed digital designs and components. Advancements in technology, such as the development of higher frequency generators and improved signal fidelity, are further propelling market growth. Furthermore, the rising adoption of advanced testing techniques and the growing complexity of electronic devices necessitates the use of sophisticated high-speed digital signal generators, thereby increasing market demand. We estimate the market size in 2025 to be approximately $1.5 billion, based on observed growth trends in related sectors and expert analysis. A compound annual growth rate (CAGR) of around 8% is projected from 2025 to 2033, indicating significant market expansion in the coming years. Major players such as Keysight Technologies, Rohde & Schwarz, and Tektronix dominate the market, leveraging their strong brand reputation and technological expertise. However, the market is also witnessing the emergence of smaller companies specializing in niche applications and offering innovative solutions. The competitive landscape is marked by ongoing product development, strategic partnerships, and mergers and acquisitions. While the high cost of these advanced generators can be a restraining factor for some users, the long-term benefits in terms of improved testing accuracy and efficiency outweigh this consideration, ultimately driving market adoption. Regional growth is expected to vary, with North America and Asia-Pacific likely leading due to the concentration of technological advancements and strong demand from various industries in these regions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
FARLEAD2 receives a test scenario from the developer
No Publication Abstract is Available
Replication package for the paper "Learning by Viewing: Generating Test Inputs for Games by Integrating Human Gameplay Traces in Neuroevolution" Although automated test generation is common in many programming domains, games still challenge test generators due to their heavy randomisation and hard-to-reach program states. Neuroevolution combined with search-based software testing principles has been shown to be a promising approach for testing games, but the co-evolutionary search for optimal network topologies and weights involves unreasonably long search durations. Humans, on the other hand, tend to be quick in picking up basic gameplay. In this paper, we therefore aim to improve the evolutionary search for game input generators by integrating knowledge about human gameplay behaviour. To this end, we propose a novel way of systematically recording human gameplay traces, and integrating these traces into the evolutionary search for networks using traditional gradient descent as a mutation operator. Experiments conducted on eight diverse Scratch games demonstrate that the proposed approach reduces the required search time from five hours down to only 30 minutes on average.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Machine learning approaches promise to accelerate and improve success rates in medicinal chemistry programs by more effectively leveraging available data to guide a molecular design. A key step of an automated computational design algorithm is molecule generation, where the machine is required to design high-quality, drug-like molecules within the appropriate chemical space. Many algorithms have been proposed for molecular generation; however, a challenge is how to assess the validity of the resulting molecules. Here, we report three Turing-inspired tests designed to evaluate the performance of molecular generators. Profound differences were observed between the performance of molecule generators in these tests, highlighting the importance of selection of the appropriate design algorithms for specific circumstances. One molecule generator, based on match molecular pairs, performed excellently against all tests and thus provides a valuable component for machine-driven medicinal chemistry design workflows.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global database testing tool market is anticipated to experience substantial growth in the coming years, driven by factors such as the increasing adoption of cloud-based technologies, the rising demand for data quality and accuracy, and the growing complexity of database systems. The market is expected to reach a value of USD 1,542.4 million by 2033, expanding at a CAGR of 7.5% during the forecast period of 2023-2033. Key players in the market include Apache JMeter, DbFit, SQLMap, Mockup Data, SQL Test, NoSQLUnit, Orion, ApexSQL, QuerySurge, DBUnit, DataFactory, DTM Data Generator, Oracle, SeLite, SLOB, and others. The North American region is anticipated to hold a significant share of the database testing tool market, followed by Europe and Asia Pacific. The increasing adoption of cloud-based database testing services, the presence of key market players, and the growing demand for data testing and validation are driving the market growth in North America. Asia Pacific, on the other hand, is expected to experience the highest growth rate due to the rapidly increasing IT spending, the emergence of new technologies, and the growing number of businesses investing in data quality management solutions.
Generating Realistic Test Datasets for Duplicate Detection at Scale Using Historical Voter Data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Generating tests for games is challenging due to the high degree of randomisation inherent to games and hard-to-reach program states that require sophisticated gameplay. The test generator NEATEST tackles these challenges by combining search-based software testing principles with neuroevolution to optimise neural networks that serve as test cases. However, since NEATEST is designed as a single-objective algorithm, it may require a long time to cover fairly simple program states or may even get stuck trying to reach unreachable program states. In order to resolve these shortcomings of NEATEST, this work aims to transform the algorithm into a many-objective search algorithm that targets several program states simultaneously. To this end, we combine the neuroevolution algorithm NEATEST with the two established search-based software testing algorithms, MIO and MOSA. Moreover, we adapt the existing many-objective neuroevolution algorithm NEWS/D to serve as a test generator. Our experiments on a dataset of 20 SCRATCH programs show that extending NEATEST to target several objectives simultaneously increases the average branch coverage from 75.88% to 81.33% while reducing the required search time by 93.28%.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset contains temperature and humidity sensor readings of a particular day, which are synthetically generated using a data generator and are stored as JSON files to validate and test (performance/load testing) the data pipeline components.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset used in the article entitled 'Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools'. These datasets can be used to test several characteristics in machine learning and data processing algorithms.