32 datasets found
  1. output1.json

    • figshare.com
    txt
    Updated Sep 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gan Xin (2020). output1.json [Dataset]. http://doi.org/10.6084/m9.figshare.12981845.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 21, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Gan Xin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a JSON format file generated by a random number generator in python. The range is 0 to 1000, and numbers are float number.This data will be used by a python script for further transformation.

  2. d

    Tour de OP - JSON Generator

    • dune.com
    Updated Aug 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    0x_danw (2023). Tour de OP - JSON Generator [Dataset]. https://dune.com/discover/content/relevant?q=author:0x_danw&resource-type=queries
    Explore at:
    Dataset updated
    Aug 20, 2023
    Authors
    0x_danw
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Blockchain data query: Tour de OP - JSON Generator

  3. file2.json

    • figshare.com
    txt
    Updated Sep 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zerui xie (2020). file2.json [Dataset]. http://doi.org/10.6084/m9.figshare.12998249.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 24, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    zerui xie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1000 number which generated from a formula

  4. file1.json

    • figshare.com
    txt
    Updated Sep 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zerui xie (2020). file1.json [Dataset]. http://doi.org/10.6084/m9.figshare.12998255.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 24, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    zerui xie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1000 random number

  5. Z

    Data pipeline Validation And Load Testing using Multiple JSON Files

    • data.niaid.nih.gov
    Updated Mar 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mainak Adhikari; Afsana Khan; Pelle Jakovits (2021). Data pipeline Validation And Load Testing using Multiple JSON Files [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4636789
    Explore at:
    Dataset updated
    Mar 26, 2021
    Dataset provided by
    Masters Student, University of Tartu
    Lecturer, University of Tartu
    Research Fellow, University of Tartu
    Authors
    Mainak Adhikari; Afsana Khan; Pelle Jakovits
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset contains temperature and humidity sensor readings of a particular day, which are synthetically generated using a data generator and are stored as JSON files to validate and test (performance/load testing) the data pipeline components.

  6. data.json

    • figshare.com
    json
    Updated May 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andy Mitchell (2021). data.json [Dataset]. http://doi.org/10.6084/m9.figshare.14635839.v1
    Explore at:
    jsonAvailable download formats
    Dataset updated
    May 21, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Andy Mitchell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    JSON output of the Random Number Generator

  7. Data for domains without generator for the ICAPS 2021 paper "Automatic...

    • zenodo.org
    zip
    Updated Jul 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Álvaro Torralba; Álvaro Torralba; Jendrik Seipp; Jendrik Seipp; Silvan Sievers; Silvan Sievers (2022). Data for domains without generator for the ICAPS 2021 paper "Automatic Instance Generation for Classical Planning"' [Dataset]. http://doi.org/10.5281/zenodo.6686348
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 4, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Álvaro Torralba; Álvaro Torralba; Jendrik Seipp; Jendrik Seipp; Silvan Sievers; Silvan Sievers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains raw data (log files) and parsed data (JSON files) of all planners used in the paper run on planning domains for which there is no generator that could directly be used for the Autoscale training process. This dataset was used to select a subset of tasks as described in the paper, for all Autoscale versions up to this point (Autoscale 21.08 and 21.11). As such, it complements the original Zenodo entry https://zenodo.org/record/4586397.

    domains-without-generator.zip contains the raw experimental data, distributed over a subdirectory for each experiment. Each of these contain a subdirectory tree structure "runs-*" where each planner run has its own directory. For each run, there are symbolic links to the input PDDL files domain.pddl and problem.pddl (can be resolved by putting the benchmarks directory to the right place), the run log file "run.log" (stdout), possibly also a run error file "run.err" (stderr), the run script "run" used to start the experiment, and a "properties" file that contains data parsed from the log file(s).

    domains-without-generator-eval.zip contains the parsed data, again distributed over a subdirectory for each experiment. Each contains a "properties" file, which is a JSON file with combined data of all runs of the corresponding experiment. In essence, the properties file is the union over all properties files generated for each individual planner run.

  8. o

    Curtailment: Generator Type

    • spenergynetworks.opendatasoft.com
    Updated Nov 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Curtailment: Generator Type [Dataset]. https://spenergynetworks.opendatasoft.com/explore/dataset/generator-type/
    Explore at:
    Dataset updated
    Nov 3, 2025
    Description

    The "Curtailment: Generator Type" data table details the total measured curtailment aggregated by technology type.At this time, only curtailment events measured and recorded by our Active Network Management (ANM) system are captured.The table gives the following information:Aggregated related capacity of ANM sitesPast three months ASEFAPast three months ASCFor additional information on column definitions, please click the Dataset schema link below. DisclaimerWhilst all reasonable care has been taken in the preparation of this data, SP Energy Networks does not accept any responsibility or liability for the accuracy or completeness of this data, and is not liable for any loss that may be attributed to the use of this data. For the avoidance of doubt, this data should not be used for safety critical purposes without the use of appropriate safety checks and services e.g. LineSearchBeforeUDig etc. Please raise any potential issues with the data via the feedback form available at the Feedback tab above (must be logged in to see this). Some values are left blank in the dataset given that from February 2021 a number of customers connected under the Dunbar ANM system moved to an unconstrained connection and therefore no data is published beyond that point. Data TriageAs part of our commitment to enhancing the transparency, and accessibility of the data we share, we publish the results of our Data Triage process.Our Data Triage documentation includes our Risk Assessments; detailing any controls we have implemented to prevent exposure of sensitive information. Click here to access the Data Triage documentation for the Curtailment dataset. To access our full suite of Data Triage documentation, visit the SP Energy Networks Data & Information.Download dataset metadata (JSON)

  9. Z

    BPM Synthetic UI Logs Collection

    • data.niaid.nih.gov
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A. Martínez-Rojas; H. A. Reijers; J. G. Enríquez (2023). BPM Synthetic UI Logs Collection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8083835
    Explore at:
    Dataset updated
    Aug 1, 2023
    Dataset provided by
    University of Utrecht
    University of Seville
    Authors
    A. Martínez-Rojas; H. A. Reijers; J. G. Enríquez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data package described in the BPM Demos&Resources publication entitled: "BPM Hub: An Open Collection of UI Logs", consists of synthetic UI logs along with corresponding screenshots. The UI logs closely resemble real-world use cases within the administrative domain. They exhibit varying levels of complexity, measured by the number of activities, process variants, and visual features that influence the outcome of decision points. For its generation, the BPM Log Generator tool has been used, which requires the following initial generation configuration:

    Initial Generation Configuration

    Seed log: Includes a single instance for each process variant and their associated screenshots.

    Variability configuration:

    Case-level: Refers to variations in the content that can be introduced or modified by the user, such as variations in the text inputs, selectable options, checkboxes, etc.

    Scenario-level: Refers to varying the GUI (Graphical User Interface) components related to the look and feel of the different applications appearing in the process screenshots.

    Data Package Contents

    The data package comprises three distinct processes, P1, P2, P3, for which their initial configuration is provided, i.e., a tuple of . They are characterized by the following:

    P1. Client Creation

    Activities: 5

    Variants: 2

    Decision point: Revolves around the presence of an attachment in the reception of an email.

    P2. Client Deletion. User's presence in the system

    Activities: 7

    Variants: 2

    Decision point: Based on the result of the user's search in the Customer Management System (CRM), represented by a checkbox.

    P3. Client Deletion. Validation of customer payments

    Activities: 7

    Variants: 4

    Decision: Involves two conditions:

    The presence of an attachment justifying the payment of the invoices in the email.

    The existence of pending invoices in the user CRM profile.

    These problems depict processes with a single decision point, without cycles, and executed sequentially to ensure a non-interleaved execution pattern. Particularly, P3 shows higher complexity as its decision point is determined by two visual characteristics.

    Generation of UI Logs

    For each problem, case-level variations have been applied to generate logs with different sizes in the range of {10, 25, 50, 100} events. In cases where the log exceeds the desired size, the last instance is removed to maintain completeness. Each log size has its associated balanced and unbalanced log. Balanced logs have an approximately equal distribution of instances across variants, while unbalanced logs have a frequency difference of more than 20% between the most frequent and least frequent variants.

    Scenarios

    To ensure the reliability of the obtained results, 30 scenarios are generated for each tuple . These scenarios exhibit slight variations at the scenario-level, particularly in the look and feel and user interface of the applications depicted in the screenshots. Each scenario consists of UI logs that correspond to specific problems categorized by log size (10, 25, 50, 100) and balanced? (Balanced, Unbalanced). Folders containing UI logs and their corresponding screenshots are organized in folders named as follows: sc{scenarioId}_size_{LogSize}_{Balanced?}.

    Additional Artefacts

    In addition, each problem includes two more artefacts:

    initial_generation_configuration folder: Holds the data needed for problem data generation using the [5] tool.

    decision.json file: Specifies the condition driving the decision made at the decision point.

    decision.json

    The decision.json acts as a testing oracle, serving as a label for validating mined data. It contains two main sections: "UICompos" and "decision". The "UICompos" section includes a key for each activity related to the decision, storing key-value pairs that represent the UI components involved, along with their bounding box coordinates. The "decision" section defines the condition for a case to match a specific variant based on the mentioned UI components.

  10. Z

    TRAVEL: A Dataset with Toolchains for Test Generation and Regression Testing...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pouria Derakhshanfar; Annibale Panichella; Alessio Gambi; Vincenzo Riccio; Christian Birchler; Sebastiano Panichella (2024). TRAVEL: A Dataset with Toolchains for Test Generation and Regression Testing of Self-driving Cars Software [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_5911160
    Explore at:
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Zurich University of Applied Sciences
    Delft University of Technology
    Università della Svizzera Italiana
    University of Passau
    Authors
    Pouria Derakhshanfar; Annibale Panichella; Alessio Gambi; Vincenzo Riccio; Christian Birchler; Sebastiano Panichella
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    This repository hosts the Testing Roads for Autonomous VEhicLes (TRAVEL) dataset. TRAVEL is an extensive collection of virtual roads that have been used for testing lane assist/keeping systems (i.e., driving agents) and data from their execution in state of the art, physically accurate driving simulator, called BeamNG.tech. Virtual roads consist of sequences of road points interpolated using Cubic splines.

    Along with the data, this repository contains instructions on how to install the tooling necessary to generate new data (i.e., test cases) and analyze them in the context of test regression. We focus on test selection and test prioritization, given their importance for developing high-quality software following the DevOps paradigms.

    This dataset builds on top of our previous work in this area, including work on

    test generation (e.g., AsFault, DeepJanus, and DeepHyperion) and the SBST CPS tool competition (SBST2021),

    test selection: SDC-Scissor and related tool

    test prioritization: automated test cases prioritization work for SDCs.

    Dataset Overview

    The TRAVEL dataset is available under the data folder and is organized as a set of experiments folders. Each of these folders is generated by running the test-generator (see below) and contains the configuration used for generating the data (experiment_description.csv), various statistics on generated tests (generation_stats.csv) and found faults (oob_stats.csv). Additionally, the folders contain the raw test cases generated and executed during each experiment (test..json).

    The following sections describe what each of those files contains.

    Experiment Description

    The experiment_description.csv contains the settings used to generate the data, including:

    Time budget. The overall generation budget in hours. This budget includes both the time to generate and execute the tests as driving simulations.

    The size of the map. The size of the squared map defines the boundaries inside which the virtual roads develop in meters.

    The test subject. The driving agent that implements the lane-keeping system under test. The TRAVEL dataset contains data generated testing the BeamNG.AI and the end-to-end Dave2 systems.

    The test generator. The algorithm that generated the test cases. The TRAVEL dataset contains data obtained using various algorithms, ranging from naive and advanced random generators to complex evolutionary algorithms, for generating tests.

    The speed limit. The maximum speed at which the driving agent under test can travel.

    Out of Bound (OOB) tolerance. The test cases' oracle that defines the tolerable amount of the ego-car that can lie outside the lane boundaries. This parameter ranges between 0.0 and 1.0. In the former case, a test failure triggers as soon as any part of the ego-vehicle goes out of the lane boundary; in the latter case, a test failure triggers only if the entire body of the ego-car falls outside the lane.

    Experiment Statistics

    The generation_stats.csv contains statistics about the test generation, including:

    Total number of generated tests. The number of tests generated during an experiment. This number is broken down into the number of valid tests and invalid tests. Valid tests contain virtual roads that do not self-intersect and contain turns that are not too sharp.

    Test outcome. The test outcome contains the number of passed tests, failed tests, and test in error. Passed and failed tests are defined by the OOB Tolerance and an additional (implicit) oracle that checks whether the ego-car is moving or standing. Tests that did not pass because of other errors (e.g., the simulator crashed) are reported in a separated category.

    The TRAVEL dataset also contains statistics about the failed tests, including the overall number of failed tests (total oob) and its breakdown into OOB that happened while driving left or right. Further statistics about the diversity (i.e., sparseness) of the failures are also reported.

    Test Cases and Executions

    Each test..json contains information about a test case and, if the test case is valid, the data observed during its execution as driving simulation.

    The data about the test case definition include:

    The road points. The list of points in a 2D space that identifies the center of the virtual road, and their interpolation using cubic splines (interpolated_points)

    The test ID. The unique identifier of the test in the experiment.

    Validity flag and explanation. A flag that indicates whether the test is valid or not, and a brief message describing why the test is not considered valid (e.g., the road contains sharp turns or the road self intersects)

    The test data are organized according to the following JSON Schema and can be interpreted as RoadTest objects provided by the tests_generation.py module.

    { "type": "object", "properties": { "id": { "type": "integer" }, "is_valid": { "type": "boolean" }, "validation_message": { "type": "string" }, "road_points": { §\label{line:road-points}§ "type": "array", "items": { "$ref": "schemas/pair" }, }, "interpolated_points": { §\label{line:interpolated-points}§ "type": "array", "items": { "$ref": "schemas/pair" }, }, "test_outcome": { "type": "string" }, §\label{line:test-outcome}§ "description": { "type": "string" }, "execution_data": { "type": "array", "items": { "$ref" : "schemas/simulationdata" } } }, "required": [ "id", "is_valid", "validation_message", "road_points", "interpolated_points" ] }

    Finally, the execution data contain a list of timestamped state information recorded by the driving simulation. State information is collected at constant frequency and includes absolute position, rotation, and velocity of the ego-car, its speed in Km/h, and control inputs from the driving agent (steering, throttle, and braking). Additionally, execution data contain OOB-related data, such as the lateral distance between the car and the lane center and the OOB percentage (i.e., how much the car is outside the lane).

    The simulation data adhere to the following (simplified) JSON Schema and can be interpreted as Python objects using the simulation_data.py module.

    { "$id": "schemas/simulationdata", "type": "object", "properties": { "timer" : { "type": "number" }, "pos" : { "type": "array", "items":{ "$ref" : "schemas/triple" } } "vel" : { "type": "array", "items":{ "$ref" : "schemas/triple" } } "vel_kmh" : { "type": "number" }, "steering" : { "type": "number" }, "brake" : { "type": "number" }, "throttle" : { "type": "number" }, "is_oob" : { "type": "number" }, "oob_percentage" : { "type": "number" } §\label{line:oob-percentage}§ }, "required": [ "timer", "pos", "vel", "vel_kmh", "steering", "brake", "throttle", "is_oob", "oob_percentage" ] }

    Dataset Content

    The TRAVEL dataset is a lively initiative so the content of the dataset is subject to change. Currently, the dataset contains the data collected during the SBST CPS tool competition, and data collected in the context of our recent work on test selection (SDC-Scissor work and tool) and test prioritization (automated test cases prioritization work for SDCs).

    SBST CPS Tool Competition Data

    The data collected during the SBST CPS tool competition are stored inside data/competition.tar.gz. The file contains the test cases generated by Deeper, Frenetic, AdaFrenetic, and Swat, the open-source test generators submitted to the competition and executed against BeamNG.AI with an aggression factor of 0.7 (i.e., conservative driver).

        Name
        Map Size (m x m)
        Max Speed (Km/h)
        Budget (h)
        OOB Tolerance (%)
        Test Subject
    
    
    
    
        DEFAULT
        200 × 200
        120
        5 (real time)
        0.95
        BeamNG.AI - 0.7
    
    
        SBST
        200 × 200
        70
        2 (real time)
        0.5
        BeamNG.AI - 0.7
    

    Specifically, the TRAVEL dataset contains 8 repetitions for each of the above configurations for each test generator totaling 64 experiments.

    SDC Scissor

    With SDC-Scissor we collected data based on the Frenetic test generator. The data is stored inside data/sdc-scissor.tar.gz. The following table summarizes the used parameters.

        Name
        Map Size (m x m)
        Max Speed (Km/h)
        Budget (h)
        OOB Tolerance (%)
        Test Subject
    
    
    
    
        SDC-SCISSOR
        200 × 200
        120
        16 (real time)
        0.5
        BeamNG.AI - 1.5
    

    The dataset contains 9 experiments with the above configuration. For generating your own data with SDC-Scissor follow the instructions in its repository.

    Dataset Statistics

    Here is an overview of the TRAVEL dataset: generated tests, executed tests, and faults found by all the test generators grouped by experiment configuration. Some 25,845 test cases are generated by running 4 test generators 8 times in 2 configurations using the SBST CPS Tool Competition code pipeline (SBST in the table). We ran the test generators for 5 hours, allowing the ego-car a generous speed limit (120 Km/h) and defining a high OOB tolerance (i.e., 0.95), and we also ran the test generators using a smaller generation budget (i.e., 2 hours) and speed limit (i.e., 70 Km/h) while setting the OOB tolerance to a lower value (i.e., 0.85). We also collected some 5, 971 additional tests with SDC-Scissor (SDC-Scissor in the table) by running it 9 times for 16 hours using Frenetic as a test generator and defining a more realistic OOB tolerance (i.e., 0.50).

    Generating new Data

    Generating new data, i.e., test cases, can be done using the SBST CPS Tool Competition pipeline and the driving simulator BeamNG.tech.

    Extensive instructions on how to install both software are reported inside the SBST CPS Tool Competition pipeline Documentation;

  11. Synthetic Chess Board Images

    • kaggle.com
    zip
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TheFamousRat (2022). Synthetic Chess Board Images [Dataset]. https://www.kaggle.com/datasets/thefamousrat/synthetic-chess-board-images
    Explore at:
    zip(457498797 bytes)Available download formats
    Dataset updated
    Feb 13, 2022
    Authors
    TheFamousRat
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Data collection is perhaps the most crucial part of any machine learning model: without it being done properly, not enough information is present for the model to learn from the patterns leading to one output or another. Data collection is however a very complex endeavor, time-consuming due to the volume of data that needs to be acquired and annotated. Annotation is an especially problematic step, due to its difficulty, length, and vulnerability to human error and inaccuracies when annotating complex data.

    With high processing power becoming ever more accessible, synthetic dataset generation is becoming a viable option when looking to generate large volumes of accurately annotated data. With the help of photorealistic renderers, it is for example possible now to generate immense amounts of data, annotated with pixel-perfect precision and whose content is virtually indistinguishable from real-world pictures.

    As an exercise of synthetic dataset generation, the data offered here was generated using the Python API of Blender, with the images rendered through the Cycles raycaster. It represents plausible images representing pictures of chessboard and pieces. The goal is, from those pictures and their annotation, to build a model capable of recognizing the pieces, as well as their positions on the board.

    Content

    The dataset contains a large amount of synthetic, randomly generated images representing pictures of chess images, taken at an angle overlooking the board and its pieces. Each image is associated with a .json file containing its annotations. The naming convention is that each render is associated with a number X, and that the images and annotations associated with that render are respectively named X.jpg and X.json.

    The data has been generated using the Python scripts and .blend file present in this repository. The chess board and pieces models that have been used for those renders are not provided with the code.

    Data characteristics :

    • Images : 1280x1280 JPEG images representing pictures of chess game boards.
    • Annotations : JSON files containing two variables :
      • "config", a dictionary associating a cell to the type of piece it contains. If a cell is not presented in the keys, it means that it is empty.
      • "corners", a 4x2 list which contains the coordinates, in the image, of the board corners. Those corners coordinates are normalized to the [0;1] range.
    • config.json : A JSON file generated before rendering, which contains variables relative to the constant properties of the boards in the renders :
      • "cellsCoordinates", a dictionary associating a cell name to its coordinates on the board. We have for example
      • "piecesTypes", a list of strings containing the types of pieces present in the renders.

    No distinction has been hard-built between training, validation, and testing data, and is left completely up to the users. A proposed pipeline for the extraction, recognition, and placement of chess pieces is proposed in a notebook added with this dataset.

    Acknowledgements

    I would like to express my gratitude for the efforts of the Blender Foundation and all its participants, for their incredible open-source tool which once again has allowed me to conduct interesting projects with great ease.

    Inspiration

    Two interesting papers on the generation and use of synthetic data, which have inspired me to conduct this project :

    Erroll Wood, Tadas Baltrušaitis, Charlie Hewitt (2021) Fake It Till You Make It: Face analysis in the wild using synthetic data alone https://arxiv.org/abs/2109.15102 Salehe Erfanian Ebadi, You-Cyuan Jhang, Alex Zook (2021) PeopleSansPeople: A Synthetic Data Generator for Human-Centric Computer Vision https://arxiv.org/abs/2112.09290

  12. Z

    Provenance of luca App development.

    • data.niaid.nih.gov
    Updated Jun 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schreiber, Andreas (2021). Provenance of luca App development. [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5034812
    Explore at:
    Dataset updated
    Jun 27, 2021
    Dataset provided by
    German Aerospace Center (DLR)
    Authors
    Schreiber, Andreas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Provenance documents in PROV-JSON format of the luca App GitLab repositories.

    Retrieved on June 22nd, 2021 using GitLab2PROV Version 0.5.

    Commands:

    gitlab2prov -t -f json -r 1000 -p https://gitlab.com/lucaapp/web > lucaapp-web.json gitlab2prov -t -f json -r 1000 -p https://gitlab.com/lucaapp/ios > lucaapp-ios.json gitlab2prov -t -f json -r 1000 -p https://gitlab.com/lucaapp/web-crypto > lucaapp-web-crypto.json gitlab2prov -t -f json -r 1000 -p https://gitlab.com/lucaapp/android > lucaapp-android.json gitlab2prov -t -f json -r 1000 -p https://gitlab.com/lucaapp/security-overview > lucaapp-security-overview.json gitlab2prov -t -f json -r 1000 -p https://gitlab.com/lucaapp/badge-generator > lucaapp-badge-generator.json gitlab2prov -t -f json -r 1000 -p https://gitlab.com/lucaapp/cwa-event > lucaapp-cwa-event.json gitlab2prov -t -f json -r 1000 -p https://gitlab.com/lucaapp/fdroid-repository > lucaapp-fdroid-repository.json

  13. Drake Lyrics

    • kaggle.com
    zip
    Updated Nov 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juico Bowley (2020). Drake Lyrics [Dataset]. https://www.kaggle.com/juicobowley/drake-lyrics
    Explore at:
    zip(782052 bytes)Available download formats
    Dataset updated
    Nov 27, 2020
    Authors
    Juico Bowley
    Description

    Context

    This is a dataset consisting of Drake lyrics and other information gathered from Genius.com. I'm currently working on a side project while enrolled in the Data Science program at Flatiron. I've lovingly entitled this project "Ye-Spirations" which will essentially be a motivational poster generator that renders high-quality images with random lines of text generated from hip hop lyrics. I have a simple prototype built out with Kanye lyrics but I would like to continue working on it and add additional features before sharing a final version. I intend on expanding the artist options for lyrics which brings me here. This dataset is a byproduct of the expansion (which I am excited to call "Inspo-Papi") and I wanted to publish this for others to use for their own projects or even my future self.

    Content

    This dataset contains 3 files. .txt - lyrics only .json - lyrics, song title, album title, url, view count (at this time) .csv - lyrics, song title, album title, url, view count (at this time)

    Acknowledgements

    Data gathered from the GOAT of lyrics and annotations - Genius.com

    Requests

    I intend on expanding to multiple artists so if y'all have any requests feel free to shout em out!

  14. Z

    CVE-2020-12399: research data and tooling

    • data.niaid.nih.gov
    Updated Aug 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ul Hassan, Sohaib; Gridin, Iaroslav; Delgado-Lozano, Ignacio M.; Pereida García, Cesar; Chi-Domínguez, Jesús-Javier; Aldaya, Alejandro Cabrera; Brumley, Billy Bob (2020). CVE-2020-12399: research data and tooling [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3982270
    Explore at:
    Dataset updated
    Aug 13, 2020
    Dataset provided by
    Tampere University
    Authors
    ul Hassan, Sohaib; Gridin, Iaroslav; Delgado-Lozano, Ignacio M.; Pereida García, Cesar; Chi-Domínguez, Jesús-Javier; Aldaya, Alejandro Cabrera; Brumley, Billy Bob
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset and software tools are for reproducing the research results related to CVE-2020-12399, resulting from the manuscript "Déjà vu: Side-channel analysis of Mozilla's NSS", to appear at ACM CCS 2020.

    The data is from a remote timing attack against the NSS v3.51 implementation of DSA signing.

    The client machine was a 3.1 GHz 64-bit Intel i5-2400 CPU (Sandy Bridge).

    The server machine was a Raspberry Pi 3 Model B plus board containing a 1.4 GHz 64-bit quad-core Cortex-A53 processor.

    The client and server were connected by a Cisco 9300 series enterprise switch over Gbit Ethernet.

    The data contains pow(2,18) samples.

    The data was used to produce Figure 1 in the paper and contains all the remote timing attack data from Section 4.

    Data description

    The file remote_timings.json contains a single JSON array. Each entry is a dictionary representation of one digital signature. A description of the dictionary fields follows.

    p: prime (DSA parameter).

    q: generator order (DSA parameter).

    g: generator (DSA parameter).

    x: the DSA private key.

    y: the corresponding public key.

    r: first component of the DSA signature.

    s: second component of the DSA signature.

    k: the ground truth nonce generated during DSA signing.

    k_len: the ground truth number of bits in said nonce.

    msg: message digitally signed.

    h: SHA-256 hash of said message. (Truncated to the same bitlen as q.)

    id: ignored.

    latency: the measured wall clock time (CPU clock cycles) to produce the digital signature.

    Prerequisites

    sudo apt install openssl python3-ijson xxd jq

    Data setup

    Extract the JSON:

    tar xf remote_timings_rpi.tar.gz

    Key setup

    Generate the public key (public.pem here) from the provided private key (private.pem here):

    $ openssl pkey -in private.pem -pubout -out public.pem

    Examine the keys if you want.

    $ openssl pkey -in private.pem -text -noout $ openssl pkey -in public.pem -text -noout -pubin

    Example: Verify key material

    $ openssl pkey -in private.pem -text -noout Private-Key: (2048 bit) priv: 1f:87:68:eb:57:e1:f4:f1:29:a6:c8:ca:03:c8:db: 49:1d:8e:2b:81:bd:72:92:64:0c:1c:d6:6d pub: 00:9d:fa:bc:47:00:cb:11:fa:51:45:c1:bd:b1:88: 2d:dd:a2:79:5b:c3:43:0a:af:bb:83:e2:d5:84:d1: 07:01:ab:f9:ae:76:2d:dd:f2:a5:75:f5:3e:94:4d: 3b:c6:f6:ce:17:c6:60:09:5b:49:3d:cb:a0:db:ec: 29:91:85:8b:c3:f5:6c:6a:3c:01:87:12:85:ae:fc: 9e:bf:67:81:1b:1d:b1:9d:12:bd:79:8c:54:08:48: 11:13:6d:ab:b0:16:ef:11:4a:27:a7:0a:80:b3:db: 72:c1:cc:1e:e8:4a:39:b7:00:ca:97:b7:3a:6e:e9: 25:22:2e:5c:57:ee:62:be:23:d0:5e:53:a3:9f:05: d4:7d:7f:b5:b6:cb:4b:27:90:14:79:72:a5:43:97: c6:6a:7d:f7:32:b3:67:58:90:fc:c3:65:34:57:89: 1b:43:28:68:43:24:12:5e:f1:43:76:3c:e9:bc:9c: 5d:7d:ae:d6:3a:31:32:ca:df:a4:07:88:a2:55:6e: a4:8c:da:13:c8:30:b7:2a:1c:23:0f:32:da:9e:7f: e1:f7:3d:2d:1c:58:f5:1d:f2:7d:fb:67:45:8d:dd: 84:eb:83:c4:b0:00:a6:c2:09:b0:48:48:f9:4e:a8: d7:ab:e1:c6:e8:bf:5c:fa:e3:f2:cd:c6:f1:e7:f2: 2c:90 P: 00:e5:4e:f4:32:f8:4a:ec:28:3c:dd:32:a8:05:e3: 5a:fa:a5:81:47:98:d9:a7:94:ba:34:b0:f9:7b:20: c5:fb:52:12:3e:82:d7:6e:6f:f5:50:be:5e:9f:df: 82:9b:4e:0c:9d:a2:9f:3f:0a:f3:72:c2:55:7c:46: 6e:fe:48:00:88:b6:4e:4f:9b:19:8c:98:3b:71:42: 56:d2:b4:1c:47:69:6e:fc:f0:e6:26:04:0e:e2:63: ed:06:0f:fb:a8:a9:94:73:e1:41:e0:6b:5a:b4:d9: 86:cd:7b:46:d3:39:ba:18:13:da:f2:3a:7b:dc:41: 21:83:e8:0d:25:13:31:90:5d:bd:82:41:9b:ea:6b: 8a:ba:8a:48:b1:1d:d2:3d:5e:c4:1b:29:5e:7f:b6: 56:1b:e6:91:65:ec:84:82:c2:f6:a1:b0:14:1b:0b: 08:d8:2b:2a:06:17:d7:2a:9b:c3:aa:fb:28:26:14: 3f:5d:0a:48:1a:48:45:c0:fd:ea:ec:90:6c:ec:93: c8:af:a3:31:4b:3a:d8:cd:20:ae:8f:14:58:26:49: 18:1f:7a:99:c9:da:c3:f0:76:b8:52:8d:eb:b2:e2: 98:6b:a5:47:15:c3:ff:c8:e7:6c:d3:db:c7:fb:4c: 36:3e:15:eb:45:e1:4a:5d:01:ed:3b:87:f7:69:c1: 31:59 Q: 00:ca:6d:df:fc:7b:96:2e:35:30:27:4f:1f:cf:57: 2f:e9:4c:40:97:53:a1:fa:d0:89:56:8d:2c:25 G: 43:26:04:66:b3:80:c3:3f:8d:f5:5a:29:79:58:7a: 0b:8c:72:b9:cb:23:61:5d:c1:45:c5:38:7f:33:4e: 93:63:75:8a:b0:44:61:8f:59:df:fd:2f:3f:1f:22: 73:66:ba:53:65:53:2a:57:5b:d9:40:34:be:4c:78: 22:4a:bf:94:5d:23:15:65:66:e1:1f:6b:93:12:00: f0:ac:f5:64:0d:6d:6c:a3:eb:26:83:6d:68:95:e0: 2c:bf:75:62:fa:5f:95:0f:b0:40:68:ce:66:3b:58: ed:c1:63:e3:d8:35:5c:cc:db:b8:12:e6:62:e4:63: b6:29:e0:86:75:79:bc:95:27:74:d1:fd:94:b9:7f: 6e:57:b4:e5:39:a2:15:41:94:3f:47:90:43:a5:da: dd:08:a4:92:c5:bf:ef:34:4e:2e:7e:82:5c:07:0e: dc:5d:6b:79:10:04:53:cc:b2:8e:bd:65:61:80:49: ad:c7:dd:5f:5a:9b:74:ae:bc:e0:49:f1:ad:4c:1e: 8f:4e:9d:39:e9:fe:57:4d:39:b7:ba:69:03:e3:7e: 4d:0d:9b:65:c3:55:77:ff:2c:86:27:21:c7:3e:60: a3:23:a5:e8:7e:0d:29:15:1c:5e:04:91:91:25:03: f3:97:77:6c:11:24:34:58:c9:ec:b7:ca:ce:74:cd: a7

    This shows the keys indeed match (JSON x,y, above priv,pub):

    $ grep --max-count=1 '"x"' remote_timings.json "x": "0x1F8768EB57E1F4F129A6C8CA03C8DB491D8E2B81BD7292640C1CD66D", $ grep --max-count=1 '"y"' remote_timings.json "y": "0x9DFABC4700CB11FA5145C1BDB1882DDDA2795BC3430AAFBB83E2D584D10701ABF9AE762DDDF2A575F53E944D3BC6F6CE17C660095B493DCBA0DBEC2991858BC3F56C6A3C01871285AEFC9EBF67811B1DB19D12BD798C54084811136DABB016EF114A27A70A80B3DB72C1CC1EE84A39B700CA97B73A6EE925222E5C57EE62BE23D05E53A39F05D47D7FB5B6CB4B2790147972A54397C66A7DF732B3675890FCC3653457891B4328684324125EF143763CE9BC9C5D7DAED63A3132CADFA40788A2556EA48CDA13C830B72A1C230F32DA9E7FE1F73D2D1C58F51DF27DFB67458DDD84EB83C4B000A6C209B04848F94EA8D7ABE1C6E8BF5CFAE3F2CDC6F1E7F22C90",

    This shows the DSA parameters match (JSON p,q,g, above P,Q,G):

    $ grep --max-count=1 '"p"' remote_timings.json "p": "0xE54EF432F84AEC283CDD32A805E35AFAA5814798D9A794BA34B0F97B20C5FB52123E82D76E6FF550BE5E9FDF829B4E0C9DA29F3F0AF372C2557C466EFE480088B64E4F9B198C983B714256D2B41C47696EFCF0E626040EE263ED060FFBA8A99473E141E06B5AB4D986CD7B46D339BA1813DAF23A7BDC412183E80D251331905DBD82419BEA6B8ABA8A48B11DD23D5EC41B295E7FB6561BE69165EC8482C2F6A1B0141B0B08D82B2A0617D72A9BC3AAFB2826143F5D0A481A4845C0FDEAEC906CEC93C8AFA3314B3AD8CD20AE8F14582649181F7A99C9DAC3F076B8528DEBB2E2986BA54715C3FFC8E76CD3DBC7FB4C363E15EB45E14A5D01ED3B87F769C13159", $ grep --max-count=1 '"q"' remote_timings.json "q": "0xCA6DDFFC7B962E3530274F1FCF572FE94C409753A1FAD089568D2C25", $ grep --max-count=1 '"g"' remote_timings.json "g": "0x43260466B380C33F8DF55A2979587A0B8C72B9CB23615DC145C5387F334E9363758AB044618F59DFFD2F3F1F227366BA5365532A575BD94034BE4C78224ABF945D23156566E11F6B931200F0ACF5640D6D6CA3EB26836D6895E02CBF7562FA5F950FB04068CE663B58EDC163E3D8355CCCDBB812E662E463B629E0867579BC952774D1FD94B97F6E57B4E539A21541943F479043A5DADD08A492C5BFEF344E2E7E825C070EDC5D6B79100453CCB28EBD65618049ADC7DD5F5A9B74AEBCE049F1AD4C1E8F4E9D39E9FE574D39B7BA6903E37E4D0D9B65C35577FF2C862721C73E60A323A5E87E0D29151C5E0491912503F397776C11243458C9ECB7CACE74CDA7",

    Example: Extract a single entry

    Here we use the python script pickone.py to extract the entry at index 2 (starting from 0).

    $ python3 pickone.py remote_timings.json 2 | jq . > 2.json $ cat 2.json { "latency": "399901598", "y": "0x9DFABC4700CB11FA5145C1BDB1882DDDA2795BC3430AAFBB83E2D584D10701ABF9AE762DDDF2A575F53E944D3BC6F6CE17C660095B493DCBA0DBEC2991858BC3F56C6A3C01871285AEFC9EBF67811B1DB19D12BD798C54084811136DABB016EF114A27A70A80B3DB72C1CC1EE84A39B700CA97B73A6EE925222E5C57EE62BE23D05E53A39F05D47D7FB5B6CB4B2790147972A54397C66A7DF732B3675890FCC3653457891B4328684324125EF143763CE9BC9C5D7DAED63A3132CADFA40788A2556EA48CDA13C830B72A1C230F32DA9E7FE1F73D2D1C58F51DF27DFB67458DDD84EB83C4B000A6C209B04848F94EA8D7ABE1C6E8BF5CFAE3F2CDC6F1E7F22C90", "g": "0x43260466B380C33F8DF55A2979587A0B8C72B9CB23615DC145C5387F334E9363758AB044618F59DFFD2F3F1F227366BA5365532A575BD94034BE4C78224ABF945D23156566E11F6B931200F0ACF5640D6D6CA3EB26836D6895E02CBF7562FA5F950FB04068CE663B58EDC163E3D8355CCCDBB812E662E463B629E0867579BC952774D1FD94B97F6E57B4E539A21541943F479043A5DADD08A492C5BFEF344E2E7E825C070EDC5D6B79100453CCB28EBD65618049ADC7DD5F5A9B74AEBCE049F1AD4C1E8F4E9D39E9FE574D39B7BA6903E37E4D0D9B65C35577FF2C862721C73E60A323A5E87E0D29151C5E0491912503F397776C11243458C9ECB7CACE74CDA7", "h": "0xC7DEAC64C95157992CB0D77CF944CB107C756F3E30D1C49C0C48A6EA", "k": "0x742A7562E2A192996440AE2A4FDF5D37E1A532E1E6A50BCA3964BBDA", "q": "0xCA6DDFFC7B962E3530274F1FCF572FE94C409753A1FAD089568D2C25", "p": "0xE54EF432F84AEC283CDD32A805E35AFAA5814798D9A794BA34B0F97B20C5FB52123E82D76E6FF550BE5E9FDF829B4E0C9DA29F3F0AF372C2557C466EFE480088B64E4F9B198C983B714256D2B41C47696EFCF0E626040EE263ED060FFBA8A99473E141E06B5AB4D986CD7B46D339BA1813DAF23A7BDC412183E80D251331905DBD82419BEA6B8ABA8A48B11DD23D5EC41B295E7FB6561BE69165EC8482C2F6A1B0141B0B08D82B2A0617D72A9BC3AAFB2826143F5D0A481A4845C0FDEAEC906CEC93C8AFA3314B3AD8CD20AE8F14582649181F7A99C9DAC3F076B8528DEBB2E2986BA54715C3FFC8E76CD3DBC7FB4C363E15EB45E14A5D01ED3B87F769C13159", "s": "0x79FD73D901BB077D14334D8CC714804577515A1E0ADC9F995BB7534C", "r": "0x61F949D772E22EA9EFBB36442BC229767B28BE2A8061FA7339AFDDC8", "msg": "0x318198301A06092A864886F70D010903310D060B2A864886F70D0109100104301C06092A864886F70D010905310F170D3230303432343131333135375A302B060B2A864886F70D010910020C311C301A30183016041470AD64D33E65A855E6C332AA52736F71D58E7527302F06092A864886F70D0109043122042090C90BFC7A8C459EDB5AF58A8878EE826B6FD02A20E2BAAF2C73984FA380FDD2", "x": "0x1F8768EB57E1F4F129A6C8CA03C8DB491D8E2B81BD7292640C1CD66D", "id": "335451053151725", "k_len": "223" }

    Example: Dump message to binary file

    Extract the msg field from the target JSON and dump it as binary.

    $ sed -n 's/^ "msg": "0x(.*)",$/\1/p' 2.json | xxd -r -p > 2.msg $ xxd -g1 2.msg 00000000: 31 81 98 30 1a 06 09 2a 86 48 86 f7 0d 01 09 03 1..0...*.H...... 00000010: 31 0d 06 0b 2a 86 48 86 f7 0d 01 09 10 01 04 30 1...*.H........0 00000020: 1c 06 09 2a 86 48 86 f7 0d 01 09 05 31 0f 17 0d ...*.H......1... 00000030: 32 30 30 34 32 34 31 31 33 31 35 37 5a 30 2b 06

  15. e

    Wiener Linien generator

    • data.europa.eu
    png
    Updated Aug 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Wiener Linien generator [Dataset]. https://data.europa.eu/data/datasets/a7d8dd6a-f42b-478c-b296-effabefccec6?locale=da
    Explore at:
    pngAvailable download formats
    Dataset updated
    Aug 29, 2025
    Description

    Denne open source app er rettet mod udviklere, der ønsker at arbejde med Wiener Linien real-time afgang dataserver. Det tager de 3 CSV-filer af Wiener Linien og konverterer dem til en JSON array fil med en ordbog pr station. Overflødige og uregelmæssige data, såsom RBL-numre adskilt af ":", korrigeres.
    Ved at forbinde de 3 filer til en JSON-fil kan udviklere lettere og hurtigere skrive apps, der ønsker at kommunikere med realtidsafgangsdataserveren.

    Appen bruges til iOS-appen "When" (http://subzero.eu/whenn). Da "Wien Lines Generator" er open source, kan den nemt tilpasses, hvis en udvikler har brug for en anden struktur.

    Hvis en udvikler ikke har en Mac, men gerne vil have JSON-filen, kan den også hentes direkte via https://gist.github.com/hactar/6793144.

  16. f

    SPIDER - Synthetic Person Information Dataset for Entity Resolution

    • figshare.com
    Updated Jul 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Praveen Chinnappa; Rose Mary Arokiya Dass; yash mathur (2025). SPIDER - Synthetic Person Information Dataset for Entity Resolution [Dataset]. http://doi.org/10.6084/m9.figshare.29595599.v2
    Explore at:
    text/x-script.pythonAvailable download formats
    Dataset updated
    Jul 24, 2025
    Dataset provided by
    figshare
    Authors
    Praveen Chinnappa; Rose Mary Arokiya Dass; yash mathur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SPIDER - Synthetic Person Information Dataset for Entity Resolution offers researchers with ready to use data that can be utilized in benchmarking Duplicate or Entity Resolution algorithms. The dataset is aimed at person-level fields that are typical in customer data. As it is hard to source real world person level data due to Personally Identifiable Information (PII), there are very few synthetic data available publicly. The current datasets also come with limitations of small volume and core person-level fields missing in the dataset. SPIDER addresses the challenges by focusing on core person level attributes - first/last name, email, phone, address and dob. Using Python Faker library, 40,000 unique, synthetic person records are created. An additional 10,000 duplicate records are generated from the base records using 7 real-world transformation rules. The duplicate records are labelled with original base record and the duplicate rule used for record generation through is_duplicate_of and duplication_rule fieldsDuplicate RulesDuplicate record with a variation in email address.Duplicate record with a variation in email addressDuplicate record with last name variationDuplicate record with first name variationDuplicate record with a nicknameDuplicate record with near exact spellingDuplicate record with only same email and nameOutput FormatThe dataset is presented in both JSON and CSV formats for use in data processing and machine learning tools.Data RegenerationThe project includes the python script used for generating the 50,000 person records. The Python script can be expanded to include - additional duplicate rules, fuzzy name, geographical names' variations and volume adjustments.Files Includedspider_dataset_20250714_035016.csvspider_dataset_20250714_035016.jsonspider_readme.mdDataDescriptionspythoncodeV1.py

  17. ONE DATA Data Sience Workflows

    • zenodo.org
    json
    Updated Sep 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorenz Wendlinger; Emanuel Berndl; Michael Granitzer; Lorenz Wendlinger; Emanuel Berndl; Michael Granitzer (2021). ONE DATA Data Sience Workflows [Dataset]. http://doi.org/10.5281/zenodo.4633704
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Sep 17, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lorenz Wendlinger; Emanuel Berndl; Michael Granitzer; Lorenz Wendlinger; Emanuel Berndl; Michael Granitzer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The ONE DATA data science workflow dataset ODDS-full comprises 815 unique workflows in temporally ordered versions.
    A version of a workflow describes its evolution over time, so whenever a workflow is altered meaningfully, a new version of this respective workflow is persisted.
    Overall, 16035 versions are available.

    The ODDS-full workflows represent machine learning workflows expressed as node-heterogeneous DAGs with 156 different node types.
    These node types represent various kinds of processing steps of a general machine learning workflow and are grouped into 5 categories, which are listed below.

    • Load Processors for loading or generating data (e.g. via a random number generator).
    • Save Processors for persisting data (possible in various data formats, via external connections or as a contained result within the ONE DATA platform) or for providing data to other places as a service.
    • Transformation Processors for altering and adapting data. This includes e.g. database-like operations such as renaming columns or joining tables as well as fully fledged dataset queries.
    • Quantitative Methods Various aggregation or correlation analysis, bucketing, and simple forecasting.
    • Advanced Methods Advanced machine learning algorithms such as BNN or Linear Regression. Also includes special meta processors that for example allow the execution of external workflows within the original workflow.

    Any metadata beyond the structure and node types of a workflow has been removed for anonymization purposes

    ODDS, a filtered variant, which enforces weak connectedness and only contains workflows with at least 5 different versions and 5 nodes, is available as the default version for supervised and unsupvervised learning.

    Workflows are served as JSON node-link graphs via networkx.

    They can be loaded into python as follows:

    import pandas as pd
    import networkx as nx
    import json
    
    with open('ODDS.json', 'r') as f:
      graphs = pd.Series(list(map(nx.node_link_graph, json.load(f)['graphs'])))

  18. Z

    CVE-2019-1547: research data and tooling

    • data.niaid.nih.gov
    Updated Jun 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pereida García, Cesar; ul Hassan, Sohaib; Tuveri, Nicola; Gridin, Iaroslav; Aldaya, Alejandro Cabrera; Brumley, Billy Bob (2020). CVE-2019-1547: research data and tooling [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3736311
    Explore at:
    Dataset updated
    Jun 5, 2020
    Dataset provided by
    Tampere University
    Authors
    Pereida García, Cesar; ul Hassan, Sohaib; Tuveri, Nicola; Gridin, Iaroslav; Aldaya, Alejandro Cabrera; Brumley, Billy Bob
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset and software tool are for reproducing the research results related to CVE-2019-1547, resulting from the manuscript "Certified Side Channels". The data was used to produce Figure 4 in the paper and is part of the remote timing attack data in Section 4.1.

    Data description

    The file timings.json contains a single JSON array. Each entry is a dictionary representation of one digital signature. A description of the dictionary fields follows.

    hash_function: string denoting the hash function for the digital signature.

    hash: the output of said hash function, i.e. hash of the message digitally signed.

    order: the order of the generator.

    private_key: the ECDSA private key.

    public_key: the corresponding public key.

    sig_r: the r component of the ECDSA signature.

    sig_s: the s component of the ECDSA signature.

    sig_nonce: the ground truth nonce generated during ECDSA signing.

    nonce_bits: the ground truth number of bits in said nonce.

    latency: the measured wall clock time (CPU clock cycles) to produce the digital signature.

    Prerequisites

    OpenSSL 1.1.1a, 1.1.1b, or 1.1.1.c.

    sudo apt install python-ijson jq

    Data setup

    Extract the JSON:

    tar xf timings.tar.xz

    Key setup

    Generate the public key (public.pem here) from the provided private key (private.pem here):

    $ openssl pkey -in private.pem -pubout -out public.pem

    Examine the keys if you want.

    $ openssl pkey -in private.pem -text -noout $ openssl pkey -in public.pem -text -noout -pubin

    Example: Verify key material

    $ grep --max-count=1 'private_key' timings.json "private_key":"0x6b76cc816dce9a8ebc6ff190bcf0555310d1fb0824047f703f627f338bcf5435", $ grep --max-count=1 'public_key' timings.json "public_key":"0x04396d7ae480016df31f84f80439e320b0638e024014a5d8e14923eea76948afb25a321ccadabd8a4295a1e8823879b9b65369bd49d337086850b3c799c7352828", $ openssl pkey -in private.pem -text -noout Private-Key: (256 bit) priv: 6b:76:cc:81:6d:ce:9a:8e:bc:6f:f1:90:bc:f0:55: 53:10:d1:fb:08:24:04:7f:70:3f:62:7f:33:8b:cf: 54:35 pub: 04:39:6d:7a:e4:80:01:6d:f3:1f:84:f8:04:39:e3: 20:b0:63:8e:02:40:14:a5:d8:e1:49:23:ee:a7:69: 48:af:b2:5a:32:1c:ca:da:bd:8a:42:95:a1:e8:82: 38:79:b9:b6:53:69:bd:49:d3:37:08:68:50:b3:c7: 99:c7:35:28:28 Field Type: prime-field Prime: 00:ff:ff:ff:ff:00:00:00:01:00:00:00:00:00:00: 00:00:00:00:00:00:ff:ff:ff:ff:ff:ff:ff:ff:ff: ff:ff:ff A:
    00:ff:ff:ff:ff:00:00:00:01:00:00:00:00:00:00: 00:00:00:00:00:00:ff:ff:ff:ff:ff:ff:ff:ff:ff: ff:ff:fc B:
    5a:c6:35:d8:aa:3a:93:e7:b3:eb:bd:55:76:98:86: bc:65:1d:06:b0:cc:53:b0:f6:3b:ce:3c:3e:27:d2: 60:4b Generator (uncompressed): 04:6b:17:d1:f2:e1:2c:42:47:f8:bc:e6:e5:63:a4: 40:f2:77:03:7d:81:2d:eb:33:a0:f4:a1:39:45:d8: 98:c2:96:4f:e3:42:e2:fe:1a:7f:9b:8e:e7:eb:4a: 7c:0f:9e:16:2b:ce:33:57:6b:31:5e:ce:cb:b6:40: 68:37:bf:51:f5 Order: 00:ff:ff:ff:ff:00:00:00:00:ff:ff:ff:ff:ff:ff: ff:ff:bc:e6:fa:ad:a7:17:9e:84:f3:b9:ca:c2:fc: 63:25:51 Cofactor: 0 Seed: c4:9d:36:08:86:e7:04:93:6a:66:78:e1:13:9d:26: b7:81:9f:7e:90

    Three things to note in the output:

    The private key bytes match (private_key and priv byte strings are equal)

    The public key bytes match (public_key and pub byte strings are equal)

    This is an explicit parameters key, with the Cofactor parameter missing or zero, as described in the manuscript.

    Example: Extract a single entry

    Here we use the python script pickone.py to extract the entry at index 2 (starting from 0).

    $ python2 pickone.py timings.json 2 | jq . > 2.json $ cat 2.json { "public_key": "0x04396d7ae480016df31f84f80439e320b0638e024014a5d8e14923eea76948afb25a321ccadabd8a4295a1e8823879b9b65369bd49d337086850b3c799c7352828", "private_key": "0x6b76cc816dce9a8ebc6ff190bcf0555310d1fb0824047f703f627f338bcf5435", "hash": "0xf36d0481e14869fc558b39ae4c747bc6c089a0271b23cfd92bc0b8aa7ed2c3aa", "latency": 21565213, "nonce_bits": 253, "sig_nonce": "0x1b88c7802ea000ccb21116575c38004579b55f1f9c4f81ed321896b1e1034237", "hash_function": "sha256", "sig_s": "0x8c83417891547224006723169de9745a81fa8de7176428e1cd8e6110408f45da", "sig_r": "0xf922d9ba4f65d207300cc7eaaa15564e60a2b1f208d1389057ff1a1ec52dc653", "order": "0xffffffff00000000ffffffffffffffffbce6faada7179e84f3b9cac2fc632551" }

    Example: Dump hash to binary file

    Extract the hash field from the target JSON and dump it as binary.

    $ sed -n 's/^ "hash": "0x(.*)",$/\1/p' 2.json | xxd -r -p > 2.hash $ xxd -g1 2.hash 00000000: f3 6d 04 81 e1 48 69 fc 55 8b 39 ae 4c 74 7b c6 .m...Hi.U.9.Lt{. 00000010: c0 89 a0 27 1b 23 cf d9 2b c0 b8 aa 7e d2 c3 aa ...'.#..+...~...

    Note the xxd output matches the hash byte string from the target JSON.

    Example: Dump signature to DER

    The hex2der.sh script takes as an argument the target JSON filename, and outputs the DER-encoded ECDSA signature to stdout by extracting the sig_r and sig_s fields from the target JSON.

    $ ./hex2der.sh 2.json > 2.der $ openssl asn1parse -in 2.der -inform DER 0:d=0 hl=2 l= 70 cons: SEQUENCE
    2:d=1 hl=2 l= 33 prim: INTEGER :F922D9BA4F65D207300CC7EAAA15564E60A2B1F208D1389057FF1A1EC52DC653 37:d=1 hl=2 l= 33 prim: INTEGER :8C83417891547224006723169DE9745A81FA8DE7176428E1CD8E6110408F45DA

    Note the asn1parse output contains a sequence with two integers, matching the sig_r and sig_s fields from the target JSON.

    Example: Verify the signature

    We use pkeyutl here to verify the raw hash directly, in contrast to dgst that will only verify by recomputing the hash itself.

    $ openssl pkeyutl -in 2.hash -inkey public.pem -pubin -verify -sigfile 2.der Signature Verified Successfully

    Note it fails for other hashes (messages), a fundamental security property for digital signatures:

    $ dd if=/dev/urandom of=bad.hash bs=1 count=32 32+0 records in 32+0 records out 32 bytes copied, 0.00129336 s, 24.7 kB/s $ openssl pkeyutl -in bad.hash -inkey public.pem -pubin -verify -sigfile 2.der Signature Verification Failure

    Example: Statistics

    The stats.py script shows how to extract the desired fields from the JSON. It computes the median latency over each nonce bit length.

    $ python2 stats.py timings.json Len Median 238 20592060 239 20251286 240 20706144 241 20658896 242 20820100 243 20762304 244 20907332 245 20973536 246 20972244 247 21057788 248 21115419 249 21157888 250 21210560 251 21266378 252 21322146 253 21370608 254 21425454 255 21479105 256 21532532

    You can verify these medians are consistent with Figure 4 in the paper.

    The stats.py script can be easily modified for more advanced analysis.

    Credits

    Authors

    Cesar Pereida García (Tampere University, Tampere, Finland)

    Sohaib ul Hassan (Tampere University, Tampere, Finland)

    Iaroslav Gridin (Tampere University, Tampere, Finland)

    Nicola Tuveri (Tampere University, Tampere, Finland)

    Alejandro Cabrera Aldaya (Tampere University, Tampere, Finland)

    Billy Bob Brumley (Tampere University, Tampere, Finland)

    Funding

    This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 804476).

    License

    This project is distributed under MIT license.

  19. Aventa AV-7 ETH Zurich Research Wind Turbine SCADA and SHM

    • zenodo.org
    jpeg, pdf, png, zip
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eleni Chatzi; Imad Abdallah; Martin Hofsäß; Oliver Bischoff; Sarah Barber; Yuriy Marykovskiy; Eleni Chatzi; Imad Abdallah; Martin Hofsäß; Oliver Bischoff; Sarah Barber; Yuriy Marykovskiy (2024). Aventa AV-7 ETH Zurich Research Wind Turbine SCADA and SHM [Dataset]. http://doi.org/10.5281/zenodo.8223010
    Explore at:
    png, zip, pdf, jpegAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Eleni Chatzi; Imad Abdallah; Martin Hofsäß; Oliver Bischoff; Sarah Barber; Yuriy Marykovskiy; Eleni Chatzi; Imad Abdallah; Martin Hofsäß; Oliver Bischoff; Sarah Barber; Yuriy Marykovskiy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Zürich
    Description

    General description of wind turbine: The ETH owned wind turbine is Aventa AV-7, manufactured by Aventa AG in Switzerland and was commissioned in December 2002. The turbine is operated via a belt-driven generator and a frequency converter with a variable speed drive. The rated power of the Aventa AV-7 is 7 kW, beginning production at a wind speed of 2 m/s and having a cut-off speed of 14 m/s. The rotor diameter is 12.8 m with 3 rotor blades, and a hub height is 18m. The maximum rotational speed of the turbine is 63 rpm. The tower is a tubular steel-reinforced concrete structure, supported on concrete foundation, while the blades are made of glassfiber with a tubular steel main-spar. The turbine is regulated via a variable-speed and variable pitch control system.

    Location of site: The wind turbine is located in Taggenberg, about 5 km from the city centre of Winterthur, Switzerland. This site is easily accessible by public transport and on foot with direct road access right next to the turbine. This prime location reduces the cost of site visits and allows for frequent personal monitoring of the site when test equipment is installed. The coordinates of the site are: 47°31'12.2"N 8°40'55.7"E.

    Control and measurement systems and signals: The turbine is regulated via a variable-speed and collective variable pitch control system.

    SHM Motivation: Designed and commissioned in 2002, the Aventa wind turbine in Winterthur is soon reaching its end of design lifetime. In order to assess the various techniques of predicting the remaining useful lifetime, a Structural Health Monitoring (SHM) campaign was implemented by ETH Zurich. The monitoring campaign started in 2020, and is still ongoing. In addition, the setup is used as a research platform on topics such as system identification, operational modal analysis, faults/damage detection and classification. We analyze the influence of operational and environmental conditions on the modal parameters and to further infer Performance Indicators (PIs) for assessing structural behavior in terms of deterioration processes.

    Data Description: The tower and nacelle have been instrumented with 11 accelerometers distributed along the length of the tower, nacelle main frame, main bearing and generator. Two full bridge strain gauges are installed on the concrete tower based measuring fore-aft and side-side strain (and can be converted to bending moments) – all acceleration and strain signals sampled at 200Hz. Temperature and humidity are measured at the tower base – 1Hz data. In additional we are collecting operational performance data (SCADA), namely: wind speed, nacelle yaw orientation, rotor RPM, power output and turbine status – SCADA signals are sampled at 10Hz. See appendix for further details of the sensors layout.

    The measurements/instrumentation setup, type and layout is provided in the pdf files.

    The data: the data is provided in zip files corresponding to four use-cases as follows:

    • Normal operation data for system identification
    • Aerodynamic imbalance on one blade
    • Rotor icing event
    • Failure of the flexible coupling of the linear drive of the collective pitch system

    The data for each of the four uses-cases is organized in zip files. The content of each zip file is as follows:

    • Time-series data in HDF5 format
    • Metadata:
      • Turbine specification (Aventa-AV-7.json and Aventa-AV-7.yaml)
      • Sensor specification (Aventa_sensors.json )
      • Unstructured description of the Aventa Turbine and the installed sensors (Aventa_Sensors_Specs.xlsx)
    • Semantic artifacts:
      • WindIO Wind Turbine YAML schema describing turbine specifications (IEAontology_schema.yaml)
      • Sensor specification JSON schema (sensors_schema.json)
    • Media: Pictures of leading edge roughness and a clip of wind turbine operation
    • Code: Jupyter notebook containing example code to load metadata from JSON and data from HDF5 files (example.ipynb)

    Additional data is available upon request, please contact:

    • Prof. Dr. Eleni Chatzi (chatzi@ibk.baug.ethz.ch)
    • Dr. Imad Abdallah (ai@rtdt.ai , abdallah@ibk.baug.ethz.ch)

    For further details or questions, please contact:

    Prof. Dr. Eleni Chatzi
    Chair of Structural Mechanics & Monitoring

    ETH Zürich
    http://www.chatzi.ibk.ethz.ch/

  20. QA Types for LLM Routing

    • kaggle.com
    zip
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ayhantasyurt (2025). QA Types for LLM Routing [Dataset]. https://www.kaggle.com/datasets/ayhantasyurt/qa-types-for-llm-routing
    Explore at:
    zip(2366282 bytes)Available download formats
    Dataset updated
    Jul 4, 2025
    Authors
    ayhantasyurt
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Generated Datasets

    This directory contains the question-answer datasets generated by the autonomous agent in this project. The data is formatted in JSON and is designed for use in training and evaluating Retrieval-Augmented Generation (RAG) models.

    File Naming Convention

    The files are named according to the type of question they contain:

    • generated_dataset_A.json: Contains Type A (Factual) questions.
    • generated_dataset_B.json: Contains Type B (No-Context) questions.
    • generated_dataset_C.json: Contains Type C (Comparative) questions.

    Data Schema

    Each JSON file is a list of objects, where each object represents a single generated example and follows this schema:

    [
     {
      "topic": "String theory",
      "url": "https://en.wikipedia.org/wiki/String_theory",
      "question": "What are the fundamental objects in string theory, according to the text?",
      "answer": "According to the text, the fundamental objects in string theory are not point-like particles but one-dimensional 'strings'.",
      "question_type": "A",
      "source_chunk": "In physics, string theory is a theoretical framework in which the point-like particles of particle physics are replaced by one-dimensional objects called strings. It describes how these strings propagate through space and interact with each other...",
      "source_chunk_2": null
     },
     {
      "topic": "History of Rome",
      "url": "https://en.wikipedia.org/wiki/History_of_Rome",
      "question": "Compare the military strategies of the Roman Republic with the Byzantine Empire.",
      "answer": "The Roman Republic's military focused on legions and aggressive expansion through disciplined infantry formations. The Byzantine Empire, while inheriting Roman traditions, adapted to defensive warfare, relying more on fortifications, naval power, and diplomacy to protect its borders.",
      "question_type": "C",
      "source_chunk": "The military of ancient Rome, according to Titus Livius, was a key element in the rise of Rome over other civilizations...",
      "source_chunk_2": "The Byzantine army was the primary military body of the Byzantine armed forces, serving alongside the Byzantine navy..."
     }
    ]
    

    Field Descriptions

    • topic: The high-level topic used to find the source article (e.g., "Renewable Energy").
    • url: The direct URL of the Wikipedia article used as the source.
    • question: The generated question.
    • answer: The generated answer.
    • question_type: The classification label assigned by the generator (A, B, or C).
    • source_chunk: The primary piece of text from the article that was used to generate the question and answer.
    • source_chunk_2: For Type C (Comparative) questions, this field contains the second piece of text used for comparison. For Type A and B, this is null.

    Usage

    coming soon

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gan Xin (2020). output1.json [Dataset]. http://doi.org/10.6084/m9.figshare.12981845.v1
Organization logoOrganization logo

output1.json

Explore at:
txtAvailable download formats
Dataset updated
Sep 21, 2020
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Gan Xin
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is a JSON format file generated by a random number generator in python. The range is 0 to 1000, and numbers are float number.This data will be used by a python script for further transformation.

Search
Clear search
Close search
Google apps
Main menu