Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a JSON format file generated by a random number generator in python. The range is 0 to 1000, and numbers are float number.This data will be used by a python script for further transformation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Blockchain data query: Tour de OP - JSON Generator
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1000 number which generated from a formula
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1000 random number
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset contains temperature and humidity sensor readings of a particular day, which are synthetically generated using a data generator and are stored as JSON files to validate and test (performance/load testing) the data pipeline components.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
JSON output of the Random Number Generator
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains raw data (log files) and parsed data (JSON files) of all planners used in the paper run on planning domains for which there is no generator that could directly be used for the Autoscale training process. This dataset was used to select a subset of tasks as described in the paper, for all Autoscale versions up to this point (Autoscale 21.08 and 21.11). As such, it complements the original Zenodo entry https://zenodo.org/record/4586397.
domains-without-generator.zip contains the raw experimental data, distributed over a subdirectory for each experiment. Each of these contain a subdirectory tree structure "runs-*" where each planner run has its own directory. For each run, there are symbolic links to the input PDDL files domain.pddl and problem.pddl (can be resolved by putting the benchmarks directory to the right place), the run log file "run.log" (stdout), possibly also a run error file "run.err" (stderr), the run script "run" used to start the experiment, and a "properties" file that contains data parsed from the log file(s).
domains-without-generator-eval.zip contains the parsed data, again distributed over a subdirectory for each experiment. Each contains a "properties" file, which is a JSON file with combined data of all runs of the corresponding experiment. In essence, the properties file is the union over all properties files generated for each individual planner run.
Facebook
TwitterThe "Curtailment: Generator Type" data table details the total measured curtailment aggregated by technology type.At this time, only curtailment events measured and recorded by our Active Network Management (ANM) system are captured.The table gives the following information:Aggregated related capacity of ANM sitesPast three months ASEFAPast three months ASCFor additional information on column definitions, please click the Dataset schema link below. DisclaimerWhilst all reasonable care has been taken in the preparation of this data, SP Energy Networks does not accept any responsibility or liability for the accuracy or completeness of this data, and is not liable for any loss that may be attributed to the use of this data. For the avoidance of doubt, this data should not be used for safety critical purposes without the use of appropriate safety checks and services e.g. LineSearchBeforeUDig etc. Please raise any potential issues with the data via the feedback form available at the Feedback tab above (must be logged in to see this). Some values are left blank in the dataset given that from February 2021 a number of customers connected under the Dunbar ANM system moved to an unconstrained connection and therefore no data is published beyond that point. Data TriageAs part of our commitment to enhancing the transparency, and accessibility of the data we share, we publish the results of our Data Triage process.Our Data Triage documentation includes our Risk Assessments; detailing any controls we have implemented to prevent exposure of sensitive information. Click here to access the Data Triage documentation for the Curtailment dataset. To access our full suite of Data Triage documentation, visit the SP Energy Networks Data & Information.Download dataset metadata (JSON)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data package described in the BPM Demos&Resources publication entitled: "BPM Hub: An Open Collection of UI Logs", consists of synthetic UI logs along with corresponding screenshots. The UI logs closely resemble real-world use cases within the administrative domain. They exhibit varying levels of complexity, measured by the number of activities, process variants, and visual features that influence the outcome of decision points. For its generation, the BPM Log Generator tool has been used, which requires the following initial generation configuration:
Initial Generation Configuration
Seed log: Includes a single instance for each process variant and their associated screenshots.
Variability configuration:
Case-level: Refers to variations in the content that can be introduced or modified by the user, such as variations in the text inputs, selectable options, checkboxes, etc.
Scenario-level: Refers to varying the GUI (Graphical User Interface) components related to the look and feel of the different applications appearing in the process screenshots.
Data Package Contents
The data package comprises three distinct processes, P1, P2, P3, for which their initial configuration is provided, i.e., a tuple of . They are characterized by the following:
P1. Client Creation
Activities: 5
Variants: 2
Decision point: Revolves around the presence of an attachment in the reception of an email.
P2. Client Deletion. User's presence in the system
Activities: 7
Variants: 2
Decision point: Based on the result of the user's search in the Customer Management System (CRM), represented by a checkbox.
P3. Client Deletion. Validation of customer payments
Activities: 7
Variants: 4
Decision: Involves two conditions:
The presence of an attachment justifying the payment of the invoices in the email.
The existence of pending invoices in the user CRM profile.
These problems depict processes with a single decision point, without cycles, and executed sequentially to ensure a non-interleaved execution pattern. Particularly, P3 shows higher complexity as its decision point is determined by two visual characteristics.
Generation of UI Logs
For each problem, case-level variations have been applied to generate logs with different sizes in the range of {10, 25, 50, 100} events. In cases where the log exceeds the desired size, the last instance is removed to maintain completeness. Each log size has its associated balanced and unbalanced log. Balanced logs have an approximately equal distribution of instances across variants, while unbalanced logs have a frequency difference of more than 20% between the most frequent and least frequent variants.
Scenarios
To ensure the reliability of the obtained results, 30 scenarios are generated for each tuple . These scenarios exhibit slight variations at the scenario-level, particularly in the look and feel and user interface of the applications depicted in the screenshots. Each scenario consists of UI logs that correspond to specific problems categorized by log size (10, 25, 50, 100) and balanced? (Balanced, Unbalanced). Folders containing UI logs and their corresponding screenshots are organized in folders named as follows: sc{scenarioId}_size_{LogSize}_{Balanced?}.
Additional Artefacts
In addition, each problem includes two more artefacts:
initial_generation_configuration folder: Holds the data needed for problem data generation using the [5] tool.
decision.json file: Specifies the condition driving the decision made at the decision point.
decision.json
The decision.json acts as a testing oracle, serving as a label for validating mined data. It contains two main sections: "UICompos" and "decision". The "UICompos" section includes a key for each activity related to the decision, storing key-value pairs that represent the UI components involved, along with their bounding box coordinates. The "decision" section defines the condition for a case to match a specific variant based on the mentioned UI components.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
This repository hosts the Testing Roads for Autonomous VEhicLes (TRAVEL) dataset. TRAVEL is an extensive collection of virtual roads that have been used for testing lane assist/keeping systems (i.e., driving agents) and data from their execution in state of the art, physically accurate driving simulator, called BeamNG.tech. Virtual roads consist of sequences of road points interpolated using Cubic splines.
Along with the data, this repository contains instructions on how to install the tooling necessary to generate new data (i.e., test cases) and analyze them in the context of test regression. We focus on test selection and test prioritization, given their importance for developing high-quality software following the DevOps paradigms.
This dataset builds on top of our previous work in this area, including work on
test generation (e.g., AsFault, DeepJanus, and DeepHyperion) and the SBST CPS tool competition (SBST2021),
test selection: SDC-Scissor and related tool
test prioritization: automated test cases prioritization work for SDCs.
Dataset Overview
The TRAVEL dataset is available under the data folder and is organized as a set of experiments folders. Each of these folders is generated by running the test-generator (see below) and contains the configuration used for generating the data (experiment_description.csv), various statistics on generated tests (generation_stats.csv) and found faults (oob_stats.csv). Additionally, the folders contain the raw test cases generated and executed during each experiment (test..json).
The following sections describe what each of those files contains.
Experiment Description
The experiment_description.csv contains the settings used to generate the data, including:
Time budget. The overall generation budget in hours. This budget includes both the time to generate and execute the tests as driving simulations.
The size of the map. The size of the squared map defines the boundaries inside which the virtual roads develop in meters.
The test subject. The driving agent that implements the lane-keeping system under test. The TRAVEL dataset contains data generated testing the BeamNG.AI and the end-to-end Dave2 systems.
The test generator. The algorithm that generated the test cases. The TRAVEL dataset contains data obtained using various algorithms, ranging from naive and advanced random generators to complex evolutionary algorithms, for generating tests.
The speed limit. The maximum speed at which the driving agent under test can travel.
Out of Bound (OOB) tolerance. The test cases' oracle that defines the tolerable amount of the ego-car that can lie outside the lane boundaries. This parameter ranges between 0.0 and 1.0. In the former case, a test failure triggers as soon as any part of the ego-vehicle goes out of the lane boundary; in the latter case, a test failure triggers only if the entire body of the ego-car falls outside the lane.
Experiment Statistics
The generation_stats.csv contains statistics about the test generation, including:
Total number of generated tests. The number of tests generated during an experiment. This number is broken down into the number of valid tests and invalid tests. Valid tests contain virtual roads that do not self-intersect and contain turns that are not too sharp.
Test outcome. The test outcome contains the number of passed tests, failed tests, and test in error. Passed and failed tests are defined by the OOB Tolerance and an additional (implicit) oracle that checks whether the ego-car is moving or standing. Tests that did not pass because of other errors (e.g., the simulator crashed) are reported in a separated category.
The TRAVEL dataset also contains statistics about the failed tests, including the overall number of failed tests (total oob) and its breakdown into OOB that happened while driving left or right. Further statistics about the diversity (i.e., sparseness) of the failures are also reported.
Test Cases and Executions
Each test..json contains information about a test case and, if the test case is valid, the data observed during its execution as driving simulation.
The data about the test case definition include:
The road points. The list of points in a 2D space that identifies the center of the virtual road, and their interpolation using cubic splines (interpolated_points)
The test ID. The unique identifier of the test in the experiment.
Validity flag and explanation. A flag that indicates whether the test is valid or not, and a brief message describing why the test is not considered valid (e.g., the road contains sharp turns or the road self intersects)
The test data are organized according to the following JSON Schema and can be interpreted as RoadTest objects provided by the tests_generation.py module.
{ "type": "object", "properties": { "id": { "type": "integer" }, "is_valid": { "type": "boolean" }, "validation_message": { "type": "string" }, "road_points": { §\label{line:road-points}§ "type": "array", "items": { "$ref": "schemas/pair" }, }, "interpolated_points": { §\label{line:interpolated-points}§ "type": "array", "items": { "$ref": "schemas/pair" }, }, "test_outcome": { "type": "string" }, §\label{line:test-outcome}§ "description": { "type": "string" }, "execution_data": { "type": "array", "items": { "$ref" : "schemas/simulationdata" } } }, "required": [ "id", "is_valid", "validation_message", "road_points", "interpolated_points" ] }
Finally, the execution data contain a list of timestamped state information recorded by the driving simulation. State information is collected at constant frequency and includes absolute position, rotation, and velocity of the ego-car, its speed in Km/h, and control inputs from the driving agent (steering, throttle, and braking). Additionally, execution data contain OOB-related data, such as the lateral distance between the car and the lane center and the OOB percentage (i.e., how much the car is outside the lane).
The simulation data adhere to the following (simplified) JSON Schema and can be interpreted as Python objects using the simulation_data.py module.
{ "$id": "schemas/simulationdata", "type": "object", "properties": { "timer" : { "type": "number" }, "pos" : { "type": "array", "items":{ "$ref" : "schemas/triple" } } "vel" : { "type": "array", "items":{ "$ref" : "schemas/triple" } } "vel_kmh" : { "type": "number" }, "steering" : { "type": "number" }, "brake" : { "type": "number" }, "throttle" : { "type": "number" }, "is_oob" : { "type": "number" }, "oob_percentage" : { "type": "number" } §\label{line:oob-percentage}§ }, "required": [ "timer", "pos", "vel", "vel_kmh", "steering", "brake", "throttle", "is_oob", "oob_percentage" ] }
Dataset Content
The TRAVEL dataset is a lively initiative so the content of the dataset is subject to change. Currently, the dataset contains the data collected during the SBST CPS tool competition, and data collected in the context of our recent work on test selection (SDC-Scissor work and tool) and test prioritization (automated test cases prioritization work for SDCs).
SBST CPS Tool Competition Data
The data collected during the SBST CPS tool competition are stored inside data/competition.tar.gz. The file contains the test cases generated by Deeper, Frenetic, AdaFrenetic, and Swat, the open-source test generators submitted to the competition and executed against BeamNG.AI with an aggression factor of 0.7 (i.e., conservative driver).
Name
Map Size (m x m)
Max Speed (Km/h)
Budget (h)
OOB Tolerance (%)
Test Subject
DEFAULT
200 × 200
120
5 (real time)
0.95
BeamNG.AI - 0.7
SBST
200 × 200
70
2 (real time)
0.5
BeamNG.AI - 0.7
Specifically, the TRAVEL dataset contains 8 repetitions for each of the above configurations for each test generator totaling 64 experiments.
SDC Scissor
With SDC-Scissor we collected data based on the Frenetic test generator. The data is stored inside data/sdc-scissor.tar.gz. The following table summarizes the used parameters.
Name
Map Size (m x m)
Max Speed (Km/h)
Budget (h)
OOB Tolerance (%)
Test Subject
SDC-SCISSOR
200 × 200
120
16 (real time)
0.5
BeamNG.AI - 1.5
The dataset contains 9 experiments with the above configuration. For generating your own data with SDC-Scissor follow the instructions in its repository.
Dataset Statistics
Here is an overview of the TRAVEL dataset: generated tests, executed tests, and faults found by all the test generators grouped by experiment configuration. Some 25,845 test cases are generated by running 4 test generators 8 times in 2 configurations using the SBST CPS Tool Competition code pipeline (SBST in the table). We ran the test generators for 5 hours, allowing the ego-car a generous speed limit (120 Km/h) and defining a high OOB tolerance (i.e., 0.95), and we also ran the test generators using a smaller generation budget (i.e., 2 hours) and speed limit (i.e., 70 Km/h) while setting the OOB tolerance to a lower value (i.e., 0.85). We also collected some 5, 971 additional tests with SDC-Scissor (SDC-Scissor in the table) by running it 9 times for 16 hours using Frenetic as a test generator and defining a more realistic OOB tolerance (i.e., 0.50).
Generating new Data
Generating new data, i.e., test cases, can be done using the SBST CPS Tool Competition pipeline and the driving simulator BeamNG.tech.
Extensive instructions on how to install both software are reported inside the SBST CPS Tool Competition pipeline Documentation;
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Data collection is perhaps the most crucial part of any machine learning model: without it being done properly, not enough information is present for the model to learn from the patterns leading to one output or another. Data collection is however a very complex endeavor, time-consuming due to the volume of data that needs to be acquired and annotated. Annotation is an especially problematic step, due to its difficulty, length, and vulnerability to human error and inaccuracies when annotating complex data.
With high processing power becoming ever more accessible, synthetic dataset generation is becoming a viable option when looking to generate large volumes of accurately annotated data. With the help of photorealistic renderers, it is for example possible now to generate immense amounts of data, annotated with pixel-perfect precision and whose content is virtually indistinguishable from real-world pictures.
As an exercise of synthetic dataset generation, the data offered here was generated using the Python API of Blender, with the images rendered through the Cycles raycaster. It represents plausible images representing pictures of chessboard and pieces. The goal is, from those pictures and their annotation, to build a model capable of recognizing the pieces, as well as their positions on the board.
The dataset contains a large amount of synthetic, randomly generated images representing pictures of chess images, taken at an angle overlooking the board and its pieces. Each image is associated with a .json file containing its annotations. The naming convention is that each render is associated with a number X, and that the images and annotations associated with that render are respectively named X.jpg and X.json.
The data has been generated using the Python scripts and .blend file present in this repository. The chess board and pieces models that have been used for those renders are not provided with the code.
Data characteristics :
No distinction has been hard-built between training, validation, and testing data, and is left completely up to the users. A proposed pipeline for the extraction, recognition, and placement of chess pieces is proposed in a notebook added with this dataset.
I would like to express my gratitude for the efforts of the Blender Foundation and all its participants, for their incredible open-source tool which once again has allowed me to conduct interesting projects with great ease.
Two interesting papers on the generation and use of synthetic data, which have inspired me to conduct this project :
Erroll Wood, Tadas Baltrušaitis, Charlie Hewitt (2021) Fake It Till You Make It: Face analysis in the wild using synthetic data alone https://arxiv.org/abs/2109.15102 Salehe Erfanian Ebadi, You-Cyuan Jhang, Alex Zook (2021) PeopleSansPeople: A Synthetic Data Generator for Human-Centric Computer Vision https://arxiv.org/abs/2112.09290
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Provenance documents in PROV-JSON format of the luca App GitLab repositories.
Retrieved on June 22nd, 2021 using GitLab2PROV Version 0.5.
Commands:
gitlab2prov -t -f json -r 1000 -p https://gitlab.com/lucaapp/web > lucaapp-web.json gitlab2prov -t -f json -r 1000 -p https://gitlab.com/lucaapp/ios > lucaapp-ios.json gitlab2prov -t -f json -r 1000 -p https://gitlab.com/lucaapp/web-crypto > lucaapp-web-crypto.json gitlab2prov -t -f json -r 1000 -p https://gitlab.com/lucaapp/android > lucaapp-android.json gitlab2prov -t -f json -r 1000 -p https://gitlab.com/lucaapp/security-overview > lucaapp-security-overview.json gitlab2prov -t -f json -r 1000 -p https://gitlab.com/lucaapp/badge-generator > lucaapp-badge-generator.json gitlab2prov -t -f json -r 1000 -p https://gitlab.com/lucaapp/cwa-event > lucaapp-cwa-event.json gitlab2prov -t -f json -r 1000 -p https://gitlab.com/lucaapp/fdroid-repository > lucaapp-fdroid-repository.json
Facebook
TwitterThis is a dataset consisting of Drake lyrics and other information gathered from Genius.com. I'm currently working on a side project while enrolled in the Data Science program at Flatiron. I've lovingly entitled this project "Ye-Spirations" which will essentially be a motivational poster generator that renders high-quality images with random lines of text generated from hip hop lyrics. I have a simple prototype built out with Kanye lyrics but I would like to continue working on it and add additional features before sharing a final version. I intend on expanding the artist options for lyrics which brings me here. This dataset is a byproduct of the expansion (which I am excited to call "Inspo-Papi") and I wanted to publish this for others to use for their own projects or even my future self.
This dataset contains 3 files. .txt - lyrics only .json - lyrics, song title, album title, url, view count (at this time) .csv - lyrics, song title, album title, url, view count (at this time)
Data gathered from the GOAT of lyrics and annotations - Genius.com
I intend on expanding to multiple artists so if y'all have any requests feel free to shout em out!
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset and software tools are for reproducing the research results related to CVE-2020-12399, resulting from the manuscript "Déjà vu: Side-channel analysis of Mozilla's NSS", to appear at ACM CCS 2020.
The data is from a remote timing attack against the NSS v3.51 implementation of DSA signing.
The client machine was a 3.1 GHz 64-bit Intel i5-2400 CPU (Sandy Bridge).
The server machine was a Raspberry Pi 3 Model B plus board containing a 1.4 GHz 64-bit quad-core Cortex-A53 processor.
The client and server were connected by a Cisco 9300 series enterprise switch over Gbit Ethernet.
The data contains pow(2,18) samples.
The data was used to produce Figure 1 in the paper and contains all the remote timing attack data from Section 4.
Data description
The file remote_timings.json contains a single JSON array. Each entry is a dictionary representation of one digital signature. A description of the dictionary fields follows.
p: prime (DSA parameter).
q: generator order (DSA parameter).
g: generator (DSA parameter).
x: the DSA private key.
y: the corresponding public key.
r: first component of the DSA signature.
s: second component of the DSA signature.
k: the ground truth nonce generated during DSA signing.
k_len: the ground truth number of bits in said nonce.
msg: message digitally signed.
h: SHA-256 hash of said message. (Truncated to the same bitlen as q.)
id: ignored.
latency: the measured wall clock time (CPU clock cycles) to produce the digital signature.
Prerequisites
sudo apt install openssl python3-ijson xxd jq
Data setup
Extract the JSON:
tar xf remote_timings_rpi.tar.gz
Key setup
Generate the public key (public.pem here) from the provided private key (private.pem here):
$ openssl pkey -in private.pem -pubout -out public.pem
Examine the keys if you want.
$ openssl pkey -in private.pem -text -noout $ openssl pkey -in public.pem -text -noout -pubin
Example: Verify key material
$ openssl pkey -in private.pem -text -noout Private-Key: (2048 bit) priv: 1f:87:68:eb:57:e1:f4:f1:29:a6:c8:ca:03:c8:db: 49:1d:8e:2b:81:bd:72:92:64:0c:1c:d6:6d pub: 00:9d:fa:bc:47:00:cb:11:fa:51:45:c1:bd:b1:88: 2d:dd:a2:79:5b:c3:43:0a:af:bb:83:e2:d5:84:d1: 07:01:ab:f9:ae:76:2d:dd:f2:a5:75:f5:3e:94:4d: 3b:c6:f6:ce:17:c6:60:09:5b:49:3d:cb:a0:db:ec: 29:91:85:8b:c3:f5:6c:6a:3c:01:87:12:85:ae:fc: 9e:bf:67:81:1b:1d:b1:9d:12:bd:79:8c:54:08:48: 11:13:6d:ab:b0:16:ef:11:4a:27:a7:0a:80:b3:db: 72:c1:cc:1e:e8:4a:39:b7:00:ca:97:b7:3a:6e:e9: 25:22:2e:5c:57:ee:62:be:23:d0:5e:53:a3:9f:05: d4:7d:7f:b5:b6:cb:4b:27:90:14:79:72:a5:43:97: c6:6a:7d:f7:32:b3:67:58:90:fc:c3:65:34:57:89: 1b:43:28:68:43:24:12:5e:f1:43:76:3c:e9:bc:9c: 5d:7d:ae:d6:3a:31:32:ca:df:a4:07:88:a2:55:6e: a4:8c:da:13:c8:30:b7:2a:1c:23:0f:32:da:9e:7f: e1:f7:3d:2d:1c:58:f5:1d:f2:7d:fb:67:45:8d:dd: 84:eb:83:c4:b0:00:a6:c2:09:b0:48:48:f9:4e:a8: d7:ab:e1:c6:e8:bf:5c:fa:e3:f2:cd:c6:f1:e7:f2: 2c:90 P: 00:e5:4e:f4:32:f8:4a:ec:28:3c:dd:32:a8:05:e3: 5a:fa:a5:81:47:98:d9:a7:94:ba:34:b0:f9:7b:20: c5:fb:52:12:3e:82:d7:6e:6f:f5:50:be:5e:9f:df: 82:9b:4e:0c:9d:a2:9f:3f:0a:f3:72:c2:55:7c:46: 6e:fe:48:00:88:b6:4e:4f:9b:19:8c:98:3b:71:42: 56:d2:b4:1c:47:69:6e:fc:f0:e6:26:04:0e:e2:63: ed:06:0f:fb:a8:a9:94:73:e1:41:e0:6b:5a:b4:d9: 86:cd:7b:46:d3:39:ba:18:13:da:f2:3a:7b:dc:41: 21:83:e8:0d:25:13:31:90:5d:bd:82:41:9b:ea:6b: 8a:ba:8a:48:b1:1d:d2:3d:5e:c4:1b:29:5e:7f:b6: 56:1b:e6:91:65:ec:84:82:c2:f6:a1:b0:14:1b:0b: 08:d8:2b:2a:06:17:d7:2a:9b:c3:aa:fb:28:26:14: 3f:5d:0a:48:1a:48:45:c0:fd:ea:ec:90:6c:ec:93: c8:af:a3:31:4b:3a:d8:cd:20:ae:8f:14:58:26:49: 18:1f:7a:99:c9:da:c3:f0:76:b8:52:8d:eb:b2:e2: 98:6b:a5:47:15:c3:ff:c8:e7:6c:d3:db:c7:fb:4c: 36:3e:15:eb:45:e1:4a:5d:01:ed:3b:87:f7:69:c1: 31:59 Q: 00:ca:6d:df:fc:7b:96:2e:35:30:27:4f:1f:cf:57: 2f:e9:4c:40:97:53:a1:fa:d0:89:56:8d:2c:25 G: 43:26:04:66:b3:80:c3:3f:8d:f5:5a:29:79:58:7a: 0b:8c:72:b9:cb:23:61:5d:c1:45:c5:38:7f:33:4e: 93:63:75:8a:b0:44:61:8f:59:df:fd:2f:3f:1f:22: 73:66:ba:53:65:53:2a:57:5b:d9:40:34:be:4c:78: 22:4a:bf:94:5d:23:15:65:66:e1:1f:6b:93:12:00: f0:ac:f5:64:0d:6d:6c:a3:eb:26:83:6d:68:95:e0: 2c:bf:75:62:fa:5f:95:0f:b0:40:68:ce:66:3b:58: ed:c1:63:e3:d8:35:5c:cc:db:b8:12:e6:62:e4:63: b6:29:e0:86:75:79:bc:95:27:74:d1:fd:94:b9:7f: 6e:57:b4:e5:39:a2:15:41:94:3f:47:90:43:a5:da: dd:08:a4:92:c5:bf:ef:34:4e:2e:7e:82:5c:07:0e: dc:5d:6b:79:10:04:53:cc:b2:8e:bd:65:61:80:49: ad:c7:dd:5f:5a:9b:74:ae:bc:e0:49:f1:ad:4c:1e: 8f:4e:9d:39:e9:fe:57:4d:39:b7:ba:69:03:e3:7e: 4d:0d:9b:65:c3:55:77:ff:2c:86:27:21:c7:3e:60: a3:23:a5:e8:7e:0d:29:15:1c:5e:04:91:91:25:03: f3:97:77:6c:11:24:34:58:c9:ec:b7:ca:ce:74:cd: a7
This shows the keys indeed match (JSON x,y, above priv,pub):
$ grep --max-count=1 '"x"' remote_timings.json "x": "0x1F8768EB57E1F4F129A6C8CA03C8DB491D8E2B81BD7292640C1CD66D", $ grep --max-count=1 '"y"' remote_timings.json "y": "0x9DFABC4700CB11FA5145C1BDB1882DDDA2795BC3430AAFBB83E2D584D10701ABF9AE762DDDF2A575F53E944D3BC6F6CE17C660095B493DCBA0DBEC2991858BC3F56C6A3C01871285AEFC9EBF67811B1DB19D12BD798C54084811136DABB016EF114A27A70A80B3DB72C1CC1EE84A39B700CA97B73A6EE925222E5C57EE62BE23D05E53A39F05D47D7FB5B6CB4B2790147972A54397C66A7DF732B3675890FCC3653457891B4328684324125EF143763CE9BC9C5D7DAED63A3132CADFA40788A2556EA48CDA13C830B72A1C230F32DA9E7FE1F73D2D1C58F51DF27DFB67458DDD84EB83C4B000A6C209B04848F94EA8D7ABE1C6E8BF5CFAE3F2CDC6F1E7F22C90",
This shows the DSA parameters match (JSON p,q,g, above P,Q,G):
$ grep --max-count=1 '"p"' remote_timings.json "p": "0xE54EF432F84AEC283CDD32A805E35AFAA5814798D9A794BA34B0F97B20C5FB52123E82D76E6FF550BE5E9FDF829B4E0C9DA29F3F0AF372C2557C466EFE480088B64E4F9B198C983B714256D2B41C47696EFCF0E626040EE263ED060FFBA8A99473E141E06B5AB4D986CD7B46D339BA1813DAF23A7BDC412183E80D251331905DBD82419BEA6B8ABA8A48B11DD23D5EC41B295E7FB6561BE69165EC8482C2F6A1B0141B0B08D82B2A0617D72A9BC3AAFB2826143F5D0A481A4845C0FDEAEC906CEC93C8AFA3314B3AD8CD20AE8F14582649181F7A99C9DAC3F076B8528DEBB2E2986BA54715C3FFC8E76CD3DBC7FB4C363E15EB45E14A5D01ED3B87F769C13159", $ grep --max-count=1 '"q"' remote_timings.json "q": "0xCA6DDFFC7B962E3530274F1FCF572FE94C409753A1FAD089568D2C25", $ grep --max-count=1 '"g"' remote_timings.json "g": "0x43260466B380C33F8DF55A2979587A0B8C72B9CB23615DC145C5387F334E9363758AB044618F59DFFD2F3F1F227366BA5365532A575BD94034BE4C78224ABF945D23156566E11F6B931200F0ACF5640D6D6CA3EB26836D6895E02CBF7562FA5F950FB04068CE663B58EDC163E3D8355CCCDBB812E662E463B629E0867579BC952774D1FD94B97F6E57B4E539A21541943F479043A5DADD08A492C5BFEF344E2E7E825C070EDC5D6B79100453CCB28EBD65618049ADC7DD5F5A9B74AEBCE049F1AD4C1E8F4E9D39E9FE574D39B7BA6903E37E4D0D9B65C35577FF2C862721C73E60A323A5E87E0D29151C5E0491912503F397776C11243458C9ECB7CACE74CDA7",
Example: Extract a single entry
Here we use the python script pickone.py to extract the entry at index 2 (starting from 0).
$ python3 pickone.py remote_timings.json 2 | jq . > 2.json $ cat 2.json { "latency": "399901598", "y": "0x9DFABC4700CB11FA5145C1BDB1882DDDA2795BC3430AAFBB83E2D584D10701ABF9AE762DDDF2A575F53E944D3BC6F6CE17C660095B493DCBA0DBEC2991858BC3F56C6A3C01871285AEFC9EBF67811B1DB19D12BD798C54084811136DABB016EF114A27A70A80B3DB72C1CC1EE84A39B700CA97B73A6EE925222E5C57EE62BE23D05E53A39F05D47D7FB5B6CB4B2790147972A54397C66A7DF732B3675890FCC3653457891B4328684324125EF143763CE9BC9C5D7DAED63A3132CADFA40788A2556EA48CDA13C830B72A1C230F32DA9E7FE1F73D2D1C58F51DF27DFB67458DDD84EB83C4B000A6C209B04848F94EA8D7ABE1C6E8BF5CFAE3F2CDC6F1E7F22C90", "g": "0x43260466B380C33F8DF55A2979587A0B8C72B9CB23615DC145C5387F334E9363758AB044618F59DFFD2F3F1F227366BA5365532A575BD94034BE4C78224ABF945D23156566E11F6B931200F0ACF5640D6D6CA3EB26836D6895E02CBF7562FA5F950FB04068CE663B58EDC163E3D8355CCCDBB812E662E463B629E0867579BC952774D1FD94B97F6E57B4E539A21541943F479043A5DADD08A492C5BFEF344E2E7E825C070EDC5D6B79100453CCB28EBD65618049ADC7DD5F5A9B74AEBCE049F1AD4C1E8F4E9D39E9FE574D39B7BA6903E37E4D0D9B65C35577FF2C862721C73E60A323A5E87E0D29151C5E0491912503F397776C11243458C9ECB7CACE74CDA7", "h": "0xC7DEAC64C95157992CB0D77CF944CB107C756F3E30D1C49C0C48A6EA", "k": "0x742A7562E2A192996440AE2A4FDF5D37E1A532E1E6A50BCA3964BBDA", "q": "0xCA6DDFFC7B962E3530274F1FCF572FE94C409753A1FAD089568D2C25", "p": "0xE54EF432F84AEC283CDD32A805E35AFAA5814798D9A794BA34B0F97B20C5FB52123E82D76E6FF550BE5E9FDF829B4E0C9DA29F3F0AF372C2557C466EFE480088B64E4F9B198C983B714256D2B41C47696EFCF0E626040EE263ED060FFBA8A99473E141E06B5AB4D986CD7B46D339BA1813DAF23A7BDC412183E80D251331905DBD82419BEA6B8ABA8A48B11DD23D5EC41B295E7FB6561BE69165EC8482C2F6A1B0141B0B08D82B2A0617D72A9BC3AAFB2826143F5D0A481A4845C0FDEAEC906CEC93C8AFA3314B3AD8CD20AE8F14582649181F7A99C9DAC3F076B8528DEBB2E2986BA54715C3FFC8E76CD3DBC7FB4C363E15EB45E14A5D01ED3B87F769C13159", "s": "0x79FD73D901BB077D14334D8CC714804577515A1E0ADC9F995BB7534C", "r": "0x61F949D772E22EA9EFBB36442BC229767B28BE2A8061FA7339AFDDC8", "msg": "0x318198301A06092A864886F70D010903310D060B2A864886F70D0109100104301C06092A864886F70D010905310F170D3230303432343131333135375A302B060B2A864886F70D010910020C311C301A30183016041470AD64D33E65A855E6C332AA52736F71D58E7527302F06092A864886F70D0109043122042090C90BFC7A8C459EDB5AF58A8878EE826B6FD02A20E2BAAF2C73984FA380FDD2", "x": "0x1F8768EB57E1F4F129A6C8CA03C8DB491D8E2B81BD7292640C1CD66D", "id": "335451053151725", "k_len": "223" }
Example: Dump message to binary file
Extract the msg field from the target JSON and dump it as binary.
$ sed -n 's/^ "msg": "0x(.*)",$/\1/p' 2.json | xxd -r -p > 2.msg $ xxd -g1 2.msg 00000000: 31 81 98 30 1a 06 09 2a 86 48 86 f7 0d 01 09 03 1..0...*.H...... 00000010: 31 0d 06 0b 2a 86 48 86 f7 0d 01 09 10 01 04 30 1...*.H........0 00000020: 1c 06 09 2a 86 48 86 f7 0d 01 09 05 31 0f 17 0d ...*.H......1... 00000030: 32 30 30 34 32 34 31 31 33 31 35 37 5a 30 2b 06
Facebook
TwitterDenne open source app er rettet mod udviklere, der ønsker at arbejde med Wiener Linien real-time afgang dataserver. Det tager de 3 CSV-filer af Wiener Linien og konverterer dem til en JSON array fil med en ordbog pr station. Overflødige og uregelmæssige data, såsom RBL-numre adskilt af ":", korrigeres.
Ved at forbinde de 3 filer til en JSON-fil kan udviklere lettere og hurtigere skrive apps, der ønsker at kommunikere med realtidsafgangsdataserveren.
Appen bruges til iOS-appen "When" (http://subzero.eu/whenn). Da "Wien Lines Generator" er open source, kan den nemt tilpasses, hvis en udvikler har brug for en anden struktur.
Hvis en udvikler ikke har en Mac, men gerne vil have JSON-filen, kan den også hentes direkte via https://gist.github.com/hactar/6793144.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SPIDER - Synthetic Person Information Dataset for Entity Resolution offers researchers with ready to use data that can be utilized in benchmarking Duplicate or Entity Resolution algorithms. The dataset is aimed at person-level fields that are typical in customer data. As it is hard to source real world person level data due to Personally Identifiable Information (PII), there are very few synthetic data available publicly. The current datasets also come with limitations of small volume and core person-level fields missing in the dataset. SPIDER addresses the challenges by focusing on core person level attributes - first/last name, email, phone, address and dob. Using Python Faker library, 40,000 unique, synthetic person records are created. An additional 10,000 duplicate records are generated from the base records using 7 real-world transformation rules. The duplicate records are labelled with original base record and the duplicate rule used for record generation through is_duplicate_of and duplication_rule fieldsDuplicate RulesDuplicate record with a variation in email address.Duplicate record with a variation in email addressDuplicate record with last name variationDuplicate record with first name variationDuplicate record with a nicknameDuplicate record with near exact spellingDuplicate record with only same email and nameOutput FormatThe dataset is presented in both JSON and CSV formats for use in data processing and machine learning tools.Data RegenerationThe project includes the python script used for generating the 50,000 person records. The Python script can be expanded to include - additional duplicate rules, fuzzy name, geographical names' variations and volume adjustments.Files Includedspider_dataset_20250714_035016.csvspider_dataset_20250714_035016.jsonspider_readme.mdDataDescriptionspythoncodeV1.py
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ONE DATA data science workflow dataset ODDS-full comprises 815 unique workflows in temporally ordered versions.
A version of a workflow describes its evolution over time, so whenever a workflow is altered meaningfully, a new version of this respective workflow is persisted.
Overall, 16035 versions are available.
The ODDS-full workflows represent machine learning workflows expressed as node-heterogeneous DAGs with 156 different node types.
These node types represent various kinds of processing steps of a general machine learning workflow and are grouped into 5 categories, which are listed below.
Any metadata beyond the structure and node types of a workflow has been removed for anonymization purposes
ODDS, a filtered variant, which enforces weak connectedness and only contains workflows with at least 5 different versions and 5 nodes, is available as the default version for supervised and unsupvervised learning.
Workflows are served as JSON node-link graphs via networkx.
They can be loaded into python as follows:
import pandas as pd
import networkx as nx
import json
with open('ODDS.json', 'r') as f:
graphs = pd.Series(list(map(nx.node_link_graph, json.load(f)['graphs'])))
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset and software tool are for reproducing the research results related to CVE-2019-1547, resulting from the manuscript "Certified Side Channels". The data was used to produce Figure 4 in the paper and is part of the remote timing attack data in Section 4.1.
Data description
The file timings.json contains a single JSON array. Each entry is a dictionary representation of one digital signature. A description of the dictionary fields follows.
hash_function: string denoting the hash function for the digital signature.
hash: the output of said hash function, i.e. hash of the message digitally signed.
order: the order of the generator.
private_key: the ECDSA private key.
public_key: the corresponding public key.
sig_r: the r component of the ECDSA signature.
sig_s: the s component of the ECDSA signature.
sig_nonce: the ground truth nonce generated during ECDSA signing.
nonce_bits: the ground truth number of bits in said nonce.
latency: the measured wall clock time (CPU clock cycles) to produce the digital signature.
Prerequisites
OpenSSL 1.1.1a, 1.1.1b, or 1.1.1.c.
sudo apt install python-ijson jq
Data setup
Extract the JSON:
tar xf timings.tar.xz
Key setup
Generate the public key (public.pem here) from the provided private key (private.pem here):
$ openssl pkey -in private.pem -pubout -out public.pem
Examine the keys if you want.
$ openssl pkey -in private.pem -text -noout $ openssl pkey -in public.pem -text -noout -pubin
Example: Verify key material
$ grep --max-count=1 'private_key' timings.json
"private_key":"0x6b76cc816dce9a8ebc6ff190bcf0555310d1fb0824047f703f627f338bcf5435",
$ grep --max-count=1 'public_key' timings.json
"public_key":"0x04396d7ae480016df31f84f80439e320b0638e024014a5d8e14923eea76948afb25a321ccadabd8a4295a1e8823879b9b65369bd49d337086850b3c799c7352828",
$ openssl pkey -in private.pem -text -noout
Private-Key: (256 bit)
priv:
6b:76:cc:81:6d:ce:9a:8e:bc:6f:f1:90:bc:f0:55:
53:10:d1:fb:08:24:04:7f:70:3f:62:7f:33:8b:cf:
54:35
pub:
04:39:6d:7a:e4:80:01:6d:f3:1f:84:f8:04:39:e3:
20:b0:63:8e:02:40:14:a5:d8:e1:49:23:ee:a7:69:
48:af:b2:5a:32:1c:ca:da:bd:8a:42:95:a1:e8:82:
38:79:b9:b6:53:69:bd:49:d3:37:08:68:50:b3:c7:
99:c7:35:28:28
Field Type: prime-field
Prime:
00:ff:ff:ff:ff:00:00:00:01:00:00:00:00:00:00:
00:00:00:00:00:00:ff:ff:ff:ff:ff:ff:ff:ff:ff:
ff:ff:ff
A:
00:ff:ff:ff:ff:00:00:00:01:00:00:00:00:00:00:
00:00:00:00:00:00:ff:ff:ff:ff:ff:ff:ff:ff:ff:
ff:ff:fc
B:
5a:c6:35:d8:aa:3a:93:e7:b3:eb:bd:55:76:98:86:
bc:65:1d:06:b0:cc:53:b0:f6:3b:ce:3c:3e:27:d2:
60:4b
Generator (uncompressed):
04:6b:17:d1:f2:e1:2c:42:47:f8:bc:e6:e5:63:a4:
40:f2:77:03:7d:81:2d:eb:33:a0:f4:a1:39:45:d8:
98:c2:96:4f:e3:42:e2:fe:1a:7f:9b:8e:e7:eb:4a:
7c:0f:9e:16:2b:ce:33:57:6b:31:5e:ce:cb:b6:40:
68:37:bf:51:f5
Order:
00:ff:ff:ff:ff:00:00:00:00:ff:ff:ff:ff:ff:ff:
ff:ff:bc:e6:fa:ad:a7:17:9e:84:f3:b9:ca:c2:fc:
63:25:51
Cofactor: 0
Seed:
c4:9d:36:08:86:e7:04:93:6a:66:78:e1:13:9d:26:
b7:81:9f:7e:90
Three things to note in the output:
The private key bytes match (private_key and priv byte strings are equal)
The public key bytes match (public_key and pub byte strings are equal)
This is an explicit parameters key, with the Cofactor parameter missing or zero, as described in the manuscript.
Example: Extract a single entry
Here we use the python script pickone.py to extract the entry at index 2 (starting from 0).
$ python2 pickone.py timings.json 2 | jq . > 2.json $ cat 2.json { "public_key": "0x04396d7ae480016df31f84f80439e320b0638e024014a5d8e14923eea76948afb25a321ccadabd8a4295a1e8823879b9b65369bd49d337086850b3c799c7352828", "private_key": "0x6b76cc816dce9a8ebc6ff190bcf0555310d1fb0824047f703f627f338bcf5435", "hash": "0xf36d0481e14869fc558b39ae4c747bc6c089a0271b23cfd92bc0b8aa7ed2c3aa", "latency": 21565213, "nonce_bits": 253, "sig_nonce": "0x1b88c7802ea000ccb21116575c38004579b55f1f9c4f81ed321896b1e1034237", "hash_function": "sha256", "sig_s": "0x8c83417891547224006723169de9745a81fa8de7176428e1cd8e6110408f45da", "sig_r": "0xf922d9ba4f65d207300cc7eaaa15564e60a2b1f208d1389057ff1a1ec52dc653", "order": "0xffffffff00000000ffffffffffffffffbce6faada7179e84f3b9cac2fc632551" }
Example: Dump hash to binary file
Extract the hash field from the target JSON and dump it as binary.
$ sed -n 's/^ "hash": "0x(.*)",$/\1/p' 2.json | xxd -r -p > 2.hash $ xxd -g1 2.hash 00000000: f3 6d 04 81 e1 48 69 fc 55 8b 39 ae 4c 74 7b c6 .m...Hi.U.9.Lt{. 00000010: c0 89 a0 27 1b 23 cf d9 2b c0 b8 aa 7e d2 c3 aa ...'.#..+...~...
Note the xxd output matches the hash byte string from the target JSON.
Example: Dump signature to DER
The hex2der.sh script takes as an argument the target JSON filename, and outputs the DER-encoded ECDSA signature to stdout by extracting the sig_r and sig_s fields from the target JSON.
$ ./hex2der.sh 2.json > 2.der
$ openssl asn1parse -in 2.der -inform DER
0:d=0 hl=2 l= 70 cons: SEQUENCE
2:d=1 hl=2 l= 33 prim: INTEGER :F922D9BA4F65D207300CC7EAAA15564E60A2B1F208D1389057FF1A1EC52DC653
37:d=1 hl=2 l= 33 prim: INTEGER :8C83417891547224006723169DE9745A81FA8DE7176428E1CD8E6110408F45DA
Note the asn1parse output contains a sequence with two integers, matching the sig_r and sig_s fields from the target JSON.
Example: Verify the signature
We use pkeyutl here to verify the raw hash directly, in contrast to dgst that will only verify by recomputing the hash itself.
$ openssl pkeyutl -in 2.hash -inkey public.pem -pubin -verify -sigfile 2.der Signature Verified Successfully
Note it fails for other hashes (messages), a fundamental security property for digital signatures:
$ dd if=/dev/urandom of=bad.hash bs=1 count=32 32+0 records in 32+0 records out 32 bytes copied, 0.00129336 s, 24.7 kB/s $ openssl pkeyutl -in bad.hash -inkey public.pem -pubin -verify -sigfile 2.der Signature Verification Failure
Example: Statistics
The stats.py script shows how to extract the desired fields from the JSON. It computes the median latency over each nonce bit length.
$ python2 stats.py timings.json Len Median 238 20592060 239 20251286 240 20706144 241 20658896 242 20820100 243 20762304 244 20907332 245 20973536 246 20972244 247 21057788 248 21115419 249 21157888 250 21210560 251 21266378 252 21322146 253 21370608 254 21425454 255 21479105 256 21532532
You can verify these medians are consistent with Figure 4 in the paper.
The stats.py script can be easily modified for more advanced analysis.
Credits
Authors
Cesar Pereida García (Tampere University, Tampere, Finland)
Sohaib ul Hassan (Tampere University, Tampere, Finland)
Iaroslav Gridin (Tampere University, Tampere, Finland)
Nicola Tuveri (Tampere University, Tampere, Finland)
Alejandro Cabrera Aldaya (Tampere University, Tampere, Finland)
Billy Bob Brumley (Tampere University, Tampere, Finland)
Funding
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 804476).
License
This project is distributed under MIT license.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General description of wind turbine: The ETH owned wind turbine is Aventa AV-7, manufactured by Aventa AG in Switzerland and was commissioned in December 2002. The turbine is operated via a belt-driven generator and a frequency converter with a variable speed drive. The rated power of the Aventa AV-7 is 7 kW, beginning production at a wind speed of 2 m/s and having a cut-off speed of 14 m/s. The rotor diameter is 12.8 m with 3 rotor blades, and a hub height is 18m. The maximum rotational speed of the turbine is 63 rpm. The tower is a tubular steel-reinforced concrete structure, supported on concrete foundation, while the blades are made of glassfiber with a tubular steel main-spar. The turbine is regulated via a variable-speed and variable pitch control system.
Location of site: The wind turbine is located in Taggenberg, about 5 km from the city centre of Winterthur, Switzerland. This site is easily accessible by public transport and on foot with direct road access right next to the turbine. This prime location reduces the cost of site visits and allows for frequent personal monitoring of the site when test equipment is installed. The coordinates of the site are: 47°31'12.2"N 8°40'55.7"E.
Control and measurement systems and signals: The turbine is regulated via a variable-speed and collective variable pitch control system.
SHM Motivation: Designed and commissioned in 2002, the Aventa wind turbine in Winterthur is soon reaching its end of design lifetime. In order to assess the various techniques of predicting the remaining useful lifetime, a Structural Health Monitoring (SHM) campaign was implemented by ETH Zurich. The monitoring campaign started in 2020, and is still ongoing. In addition, the setup is used as a research platform on topics such as system identification, operational modal analysis, faults/damage detection and classification. We analyze the influence of operational and environmental conditions on the modal parameters and to further infer Performance Indicators (PIs) for assessing structural behavior in terms of deterioration processes.
Data Description: The tower and nacelle have been instrumented with 11 accelerometers distributed along the length of the tower, nacelle main frame, main bearing and generator. Two full bridge strain gauges are installed on the concrete tower based measuring fore-aft and side-side strain (and can be converted to bending moments) – all acceleration and strain signals sampled at 200Hz. Temperature and humidity are measured at the tower base – 1Hz data. In additional we are collecting operational performance data (SCADA), namely: wind speed, nacelle yaw orientation, rotor RPM, power output and turbine status – SCADA signals are sampled at 10Hz. See appendix for further details of the sensors layout.
The measurements/instrumentation setup, type and layout is provided in the pdf files.
The data: the data is provided in zip files corresponding to four use-cases as follows:
The data for each of the four uses-cases is organized in zip files. The content of each zip file is as follows:
Additional data is available upon request, please contact:
For further details or questions, please contact:
Prof. Dr. Eleni Chatzi
Chair of Structural Mechanics & Monitoring
ETH Zürich
http://www.chatzi.ibk.ethz.ch/
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This directory contains the question-answer datasets generated by the autonomous agent in this project. The data is formatted in JSON and is designed for use in training and evaluating Retrieval-Augmented Generation (RAG) models.
The files are named according to the type of question they contain:
generated_dataset_A.json: Contains Type A (Factual) questions.generated_dataset_B.json: Contains Type B (No-Context) questions.generated_dataset_C.json: Contains Type C (Comparative) questions.Each JSON file is a list of objects, where each object represents a single generated example and follows this schema:
[
{
"topic": "String theory",
"url": "https://en.wikipedia.org/wiki/String_theory",
"question": "What are the fundamental objects in string theory, according to the text?",
"answer": "According to the text, the fundamental objects in string theory are not point-like particles but one-dimensional 'strings'.",
"question_type": "A",
"source_chunk": "In physics, string theory is a theoretical framework in which the point-like particles of particle physics are replaced by one-dimensional objects called strings. It describes how these strings propagate through space and interact with each other...",
"source_chunk_2": null
},
{
"topic": "History of Rome",
"url": "https://en.wikipedia.org/wiki/History_of_Rome",
"question": "Compare the military strategies of the Roman Republic with the Byzantine Empire.",
"answer": "The Roman Republic's military focused on legions and aggressive expansion through disciplined infantry formations. The Byzantine Empire, while inheriting Roman traditions, adapted to defensive warfare, relying more on fortifications, naval power, and diplomacy to protect its borders.",
"question_type": "C",
"source_chunk": "The military of ancient Rome, according to Titus Livius, was a key element in the rise of Rome over other civilizations...",
"source_chunk_2": "The Byzantine army was the primary military body of the Byzantine armed forces, serving alongside the Byzantine navy..."
}
]
topic: The high-level topic used to find the source article (e.g., "Renewable Energy").url: The direct URL of the Wikipedia article used as the source.question: The generated question.answer: The generated answer.question_type: The classification label assigned by the generator (A, B, or C).source_chunk: The primary piece of text from the article that was used to generate the question and answer.source_chunk_2: For Type C (Comparative) questions, this field contains the second piece of text used for comparison. For Type A and B, this is null.coming soon
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a JSON format file generated by a random number generator in python. The range is 0 to 1000, and numbers are float number.This data will be used by a python script for further transformation.