100+ datasets found
  1. P

    Film (60%/20%/20% random splits) Dataset

    • library.toponeai.link
    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Film (60%/20%/20% random splits) Dataset [Dataset]. https://library.toponeai.link/dataset/film-60-20-20-random-splits
    Explore at:
    Description

    Node classification on Film with 60%/20%/20% random splits for training/validation/test.

  2. Rome Weather Data Train-Test Split

    • kaggle.com
    zip
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hakan İrek (2024). Rome Weather Data Train-Test Split [Dataset]. https://www.kaggle.com/datasets/hakanrek/rome-weather-data-train-test-split/code
    Explore at:
    zip(2457014 bytes)Available download formats
    Dataset updated
    Apr 18, 2024
    Authors
    Hakan İrek
    Area covered
    Rome
    Description

    Dataset

    This dataset was created by Hakan İrek

    Contents

  3. Dataset, splits, models, and scripts for the QM descriptors prediction

    • zenodo.org
    • explore.openaire.eu
    application/gzip
    Updated Apr 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shih-Cheng Li; Shih-Cheng Li; Haoyang Wu; Haoyang Wu; Angiras Menon; Angiras Menon; Kevin A. Spiekermann; Kevin A. Spiekermann; Yi-Pei Li; Yi-Pei Li; William H. Green; William H. Green (2024). Dataset, splits, models, and scripts for the QM descriptors prediction [Dataset]. http://doi.org/10.5281/zenodo.10668491
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shih-Cheng Li; Shih-Cheng Li; Haoyang Wu; Haoyang Wu; Angiras Menon; Angiras Menon; Kevin A. Spiekermann; Kevin A. Spiekermann; Yi-Pei Li; Yi-Pei Li; William H. Green; William H. Green
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset, splits, models, and scripts from the manuscript "When Do Quantum Mechanical Descriptors Help Graph Neural Networks Predict Chemical Properties?" are provided. The curated dataset includes 37 QM descriptors for 64,921 unique molecules across six levels of theory: wB97XD, B3LYP, M06-2X, PBE0, TPSS, and BP86. This dataset is stored in the data.tar.gz file, which also contains a file for multitask constraints applied to various atomic and bond properties. The data splits (training, validation, and test splits) for both random and scaffold-based divisions are saved as separate index files in splits.tar.gz. The trained D-MPNN models for predicting QM descriptors are saved in the models.tar.gz file. The scripts.tar.gz file contains ready-to-use scripts for training machine learning models to predict QM descriptors, as well as scripts for predicting QM descriptors using our trained models on unseen molecules and for applying radial basis function (RBF) expansion to QM atom and bond features.

    Below are descriptions of the available scripts:

    1. atom_bond_descriptors.sh: Trains atom/bond targets.
    2. atom_bond_descriptors_predict.sh: Predicts atom/bond targets from pre-trained model.
    3. dipole_quadrupole_moments.sh: Trains dipole and quadrupole moments.
    4. dipole_quadrupole_moments_predict.sh: Predicts dipole and quadrupole moments from pre-trained model.
    5. energy_gaps_IP_EA.sh: Trains energy gaps, ionization potential (IP), and electron affinity (EA).
    6. energy_gaps_IP_EA_predict.sh: Predicts energy gaps, IP, and EA from pre-trained model.
    7. get_constraints.py: Generates constraints file for testing dataset. This generated file needs to be provided before using our trained models to predict the atom/bond QM descriptors of your testing data.
    8. csv2pkl.py: Converts QM atom and bond features to .pkl files using RBF expansion for use with Chemprop software.

    Below is the procedure for running the ml-QM-GNN on your own dataset:

    1. Use get_constraints.py to generate a constraint file required for predicting atom/bond QM descriptors with the trained ML models.
    2. Execute atom_bond_descriptors_predict.sh to predict atom and bond properties. Run dipole_quadrupole_moments_predict.sh and energy_gaps_IP_EA_predict.sh to calculate molecular QM descriptors.
    3. Utilize csv2pkl.py to convert the data from predicted atom/bond descriptors .csv file into separate atom and bond feature files (which are saved as .pkl files here).
    4. Run Chemprop to train your models using the additional predicted features supported here.
  4. ASAG train-test split

    • kaggle.com
    zip
    Updated May 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amrr Sheta (2024). ASAG train-test split [Dataset]. https://www.kaggle.com/amrrsheta/asag-train-test-split
    Explore at:
    zip(192882 bytes)Available download formats
    Dataset updated
    May 21, 2024
    Authors
    Amrr Sheta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Amrr Sheta

    Released under Apache 2.0

    Contents

  5. shopee-train-test-split

    • kaggle.com
    Updated May 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    XUE LINGJUN (2021). shopee-train-test-split [Dataset]. https://www.kaggle.com/datasets/xuelingjun/shopeetraintestsplit/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 2, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    XUE LINGJUN
    Description

    Dataset

    This dataset was created by XUE LINGJUN

    Contents

  6. Z

    Data Cleaning, Translation & Split of the Dataset for the Automatic...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Köhler, Juliane (2022). Data Cleaning, Translation & Split of the Dataset for the Automatic Classification of Documents for the Classification System for the Berliner Handreichungen zur Bibliotheks- und Informationswissenschaft [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6957841
    Explore at:
    Dataset updated
    Aug 8, 2022
    Dataset authored and provided by
    Köhler, Juliane
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cleaned_Dataset.csv – The combined CSV files of all scraped documents from DABI, e-LiS, o-bib and Springer.

    Data_Cleaning.ipynb – The Jupyter Notebook with python code for the analysis and cleaning of the original dataset.

    ger_train.csv – The German training set as CSV file.

    ger_validation.csv – The German validation set as CSV file.

    en_test.csv – The English test set as CSV file.

    en_train.csv – The English training set as CSV file.

    en_validation.csv – The English validation set as CSV file.

    splitting.py – The python code for splitting a dataset into train, test and validation set.

    DataSetTrans_de.csv – The final German dataset as a CSV file.

    DataSetTrans_en.csv – The final English dataset as a CSV file.

    translation.py – The python code for translating the cleaned dataset.

  7. d

    Industrial Machine Tool Element Surface Defect Dataset - Dataset - B2FIND

    • b2find.dkrz.de
    Updated Oct 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Industrial Machine Tool Element Surface Defect Dataset - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/63d20a5e-3584-5096-a34d-d3f93fcc8857
    Explore at:
    Dataset updated
    Oct 24, 2023
    Description

    Using Machine Learning Techniques in general and Deep Learning techniques in specific needs a certain amount of data often not available in large quantities in some technical domains. The manual inspection of Machine Tool Components, as well as the manual end of line check of products, are labour intensive tasks in industrial applications that often want to be automated by companies. To automate the classification processes and to develop reliable and robust Machine Learning based classification and wear prognostics models there is a need for real-world datasets to train and test models on. The dataset contains 1104 channel 3 images with 394 image-annotations for the surface damage type “pitting”. The annotations made with the annotation tool labelme, are available in JSON format and hence convertible to VOC and COCO format. All images come from two BSD types. The dataset available for download is divided into two folders, data with all images as JPEG, label with all annotations, and saved_model with a baseline model. The authors also provide a python script to divide the data and labels into three different split types – train_test_split, which splits images into the same train and test data-split the authors used for the baseline model, wear_dev_split, which creates all 27 wear developments and type_split, which splits the data into the occurring BSD-types. One of the two mentioned BSD types is represented with 69 images and 55 different image-sizes. All images with this BSD type come either in a clean or soiled condition. The other BSD type is shown on 325 images with two image-sizes. Since all images of this type have been taken with continuous time the degree of soiling is evolving. Also, the dataset contains as above mentioned 27 pitting development sequences with every 69 images. Instruction dataset split The authors of this dataset provide 3 types of different dataset splits. To get the data split you have to run the python script split_dataset.py. Script inputs: split-type (mandatory) output directory (mandatory) Different split-types: train_test_split: splits dataset into train and test data (80%/20%) wear_dev_split: splits dataset into 27 wear-developments type_split: splits dataset into different BSD types Example: C:\Users\Desktop>python split_dataset.py --split_type=train_test_split --output_dir=BSD_split_folder

  8. R

    Hard Hat Workers Dataset

    • universe.roboflow.com
    zip
    Updated Sep 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Nelson (2022). Hard Hat Workers Dataset [Dataset]. https://universe.roboflow.com/joseph-nelson/hard-hat-workers/model/13
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 30, 2022
    Dataset authored and provided by
    Joseph Nelson
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Variables measured
    Workers Bounding Boxes
    Description

    Overview

    The Hard Hat dataset is an object detection dataset of workers in workplace settings that require a hard hat. Annotations also include examples of just "person" and "head," for when an individual may be present without a hard hart.

    The original dataset has a 75/25 train-test split.

    Example Image: https://i.imgur.com/7spoIJT.png" alt="Example Image">

    Use Cases

    One could use this dataset to, for example, build a classifier of workers that are abiding safety code within a workplace versus those that may not be. It is also a good general dataset for practice.

    Using this Dataset

    Use the fork or Download this Dataset button to copy this dataset to your own Roboflow account and export it with new preprocessing settings (perhaps resized for your model's desired format or converted to grayscale), or additional augmentations to make your model generalize better. This particular dataset would be very well suited for Roboflow's new advanced Bounding Box Only Augmentations.

    Dataset Versions:

    Image Preprocessing | Image Augmentation | Modify Classes * v1 (resize-416x416-reflect): generated with the original 75/25 train-test split | No augmentations * v2 (raw_75-25_trainTestSplit): generated with the original 75/25 train-test split | These are the raw, original images * v3 (v3): generated with the original 75/25 train-test split | Modify Classes used to drop person class | Preprocessing and Augmentation applied * v5 (raw_HeadHelmetClasses): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop person class * v8 (raw_HelmetClassOnly): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop head and person classes * v9 (raw_PersonClassOnly): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop head and helmet classes * v10 (raw_AllClasses): generated with a 70/20/10 train/valid/test split | These are the raw, original images * v11 (augmented3x-AllClasses-FastModel): generated with a 70/20/10 train/valid/test split | Preprocessing and Augmentation applied | 3x image generation | Trained with Roboflow's Fast Model * v12 (augmented3x-HeadHelmetClasses-FastModel): generated with a 70/20/10 train/valid/test split | Preprocessing and Augmentation applied, Modify Classes used to drop person class | 3x image generation | Trained with Roboflow's Fast Model * v13 (augmented3x-HeadHelmetClasses-AccurateModel): generated with a 70/20/10 train/valid/test split | Preprocessing and Augmentation applied, Modify Classes used to drop person class | 3x image generation | Trained with Roboflow's Accurate Model * v14 (raw_HeadClassOnly): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop person class, and remap/relabel helmet class to head

    Choosing Between Computer Vision Model Sizes | Roboflow Train

    About Roboflow

    Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

    Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.

    Roboflow Workmark

  9. R

    Data from: Pothole Dataset

    • universe.roboflow.com
    zip
    Updated Nov 1, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brad Dwyer (2020). Pothole Dataset [Dataset]. https://universe.roboflow.com/brad-dwyer/pothole-voxrl/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 1, 2020
    Dataset authored and provided by
    Brad Dwyer
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Variables measured
    Potholes Bounding Boxes
    Description

    Pothole Dataset

    https://i.imgur.com/7Xz8d5M.gif" alt="Example Image">

    This is a collection of 665 images of roads with the potholes labeled. The dataset was created and shared by Atikur Rahman Chitholian as part of his undergraduate thesis and was originally shared on Kaggle.

    Note: The original dataset did not contain a validation set; we have re-shuffled the images into a 70/20/10 train-valid-test split.

    Usage

    This dataset could be used for automatically finding and categorizing potholes in city streets so the worst ones can be fixed faster.

    The dataset is provided in a wide variety of formats for various common machine learning models.

  10. dataset-muenzen-training-test-split-02

    • kaggle.com
    zip
    Updated Dec 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    pascalammeter (2024). dataset-muenzen-training-test-split-02 [Dataset]. https://www.kaggle.com/datasets/pascalammeter/dataset-muenzen-training-test-split-02/suggestions
    Explore at:
    zip(12574 bytes)Available download formats
    Dataset updated
    Dec 13, 2024
    Authors
    pascalammeter
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by pascalammeter

    Released under CC BY-NC-SA 4.0

    Contents

  11. Z

    Data from: Benchmark Datasets Incorporating Diverse Tasks, Sample Sizes,...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benchmark Datasets Incorporating Diverse Tasks, Sample Sizes, Material Systems, and Data Heterogeneity for Materials Informatics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4903957
    Explore at:
    Dataset updated
    Jun 6, 2021
    Dataset provided by
    Sparks, D. Taylor
    Kauwe, K. Steven
    Henderson, N. Ashley
    Description

    This benchmark data is comprised of 50 different datasets for materials properties obtained from 16 previous publications. The data contains both experimental and computational data, data suited for regression as well as classification, sizes ranging from 12 to 6354 samples, and materials systems spanning the diversity of materials research. In addition to cleaning the data where necessary, each dataset was split into train, validation, and test splits.

    For datasets with more than 100 values, train-val-test splits were created, either with a 5-fold or 10-fold cross-validation method, depending on what each respective paper did in their studies. Datasets with less than 100 values had train-test splits created using the Leave-One-Out cross-validation method.

    For further information, as well as directions on how to access the data, please go to the corresponding GitHub repository: https://github.com/anhender/mse_ML_datasets/tree/v1.0

  12. h

    ConvFinQA

    • huggingface.co
    Updated Apr 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AdaptLLM (2024). ConvFinQA [Dataset]. https://huggingface.co/datasets/AdaptLLM/ConvFinQA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 3, 2024
    Authors
    AdaptLLM
    Description

    Adapting Large Language Models to Domains via Continual Pre-Training

    This repo contains the ConvFinQA dataset used in our ICLR 2024 paper Adapting Large Language Models via Reading Comprehension. We explore continued pre-training on domain-specific corpora for large language models. While this approach enriches LLMs with domain knowledge, it significantly hurts their prompting ability for question answering. Inspired by human learning via reading comprehension, we propose a… See the full description on the dataset page: https://huggingface.co/datasets/AdaptLLM/ConvFinQA.

  13. R

    Data from: Fashion Mnist Dataset

    • universe.roboflow.com
    • opendatalab.com
    • +4more
    zip
    Updated Aug 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Popular Benchmarks (2022). Fashion Mnist Dataset [Dataset]. https://universe.roboflow.com/popular-benchmarks/fashion-mnist-ztryt/model/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 10, 2022
    Dataset authored and provided by
    Popular Benchmarks
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Clothing
    Description

    Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

    Authors:

    Dataset Obtained From: https://github.com/zalandoresearch/fashion-mnist

    All images were sized 28x28 in the original dataset

    Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. * Source

    Here's an example of how the data looks (each class takes three-rows): https://github.com/zalandoresearch/fashion-mnist/raw/master/doc/img/fashion-mnist-sprite.png" alt="Visualized Fashion MNIST dataset">

    Version 1 (original-images_Original-FashionMNIST-Splits):

    • Original images, with the original splits for MNIST: train (86% of images - 60,000 images) set and test (14% of images - 10,000 images) set only.
    • This version was not trained

    Version 3 (original-images_trainSetSplitBy80_20):

    • Original, raw images, with the train set split to provide 80% of its images to the training set and 20% of its images to the validation set
    • https://blog.roboflow.com/train-test-split/ https://i.imgur.com/angfheJ.png" alt="Train/Valid/Test Split Rebalancing">

    Citation:

    @online{xiao2017/online,
     author    = {Han Xiao and Kashif Rasul and Roland Vollgraf},
     title    = {Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms},
     date     = {2017-08-28},
     year     = {2017},
     eprintclass = {cs.LG},
     eprinttype  = {arXiv},
     eprint    = {cs.LG/1708.07747},
    }
    
  14. Train_Test_Split_Data

    • kaggle.com
    zip
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arya B (2024). Train_Test_Split_Data [Dataset]. https://www.kaggle.com/datasets/aryaadithyan/train-test-split-data
    Explore at:
    zip(1920111933 bytes)Available download formats
    Dataset updated
    Jul 18, 2024
    Authors
    Arya B
    Description

    Dataset

    This dataset was created by Arya B

    Contents

  15. Dollar street 10 - 64x64x3

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Apr 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sven van der burg; Sven van der burg (2024). Dollar street 10 - 64x64x3 [Dataset]. http://doi.org/10.5281/zenodo.10970014
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 14, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sven van der burg; Sven van der burg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MLCommons Dollar Street Dataset is a collection of images of everyday household items from homes around the world that visually captures socioeconomic diversity of traditionally underrepresented populations. It consists of public domain data, licensed for academic, commercial and non-commercial usage, under CC-BY and CC-BY-SA 4.0. The dataset was developed because similar datasets lack socioeconomic metadata and are not representative of global diversity.

    This is a subset of the original dataset that can be used for multiclass classification with 10 categories. It is designed to be used in teaching, similar to the widely used, but unlicensed CIFAR-10 dataset.

    These are the preprocessing steps that were performed:

    1. Only take examples with one imagenet_synonym label
    2. Use only examples with the 10 most frequently occuring labels
    3. Downscale images to 64 x 64 pixels
    4. Split data in train and test
    5. Store as numpy array

    This is the label mapping:

    Categorylabel
    day bed0
    dishrag1
    plate2
    running shoe3
    soap dispenser4
    street sign5
    table lamp6
    tile roof7
    toilet seat8
    washing machine9

    Checkout this notebook to see how the subset was created.

    The original dataset was downloaded from https://www.kaggle.com/datasets/mlcommons/the-dollar-street-dataset. See https://mlcommons.org/datasets/dollar-street/ for more information.

  16. t

    Test / Train Splits

    • dbrepo1.ec.tuwien.ac.at
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahler, Lukas (2025). Test / Train Splits [Dataset]. http://doi.org/10.82556/vj0w-ka45
    Explore at:
    Dataset updated
    Mar 20, 2025
    Authors
    Mahler, Lukas
    Time period covered
    2024
    Description

    Splits of aggregated data into testing and training subsets.

  17. Dataset for Cost-effective Simulation-based Test Selection in Self-driving...

    • zenodo.org
    • data.niaid.nih.gov
    pdf, zip
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Birchler; Nicolas Ganz; Sajad Khatiri; Alessio Gambi; Sebastiano Panichella; Christian Birchler; Nicolas Ganz; Sajad Khatiri; Alessio Gambi; Sebastiano Panichella (2024). Dataset for Cost-effective Simulation-based Test Selection in Self-driving Cars Software with SDC-Scissor [Dataset]. http://doi.org/10.5281/zenodo.5914130
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christian Birchler; Nicolas Ganz; Sajad Khatiri; Alessio Gambi; Sebastiano Panichella; Christian Birchler; Nicolas Ganz; Sajad Khatiri; Alessio Gambi; Sebastiano Panichella
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SDC-Scissor tool for Cost-effective Simulation-based Test Selection in Self-driving Cars Software

    This dataset provides test cases for self-driving cars with the BeamNG simulator. Check out the repository and demo video to get started.

    GitHub: github.com/ChristianBirchler/sdc-scissor

    This project extends the tool competition platform from the Cyber-Phisical Systems Testing Competition which was part of the SBST Workshop in 2021.

    Usage

    Demo

    YouTube Link

    Installation

    The tool can either be run with Docker or locally using Poetry.

    When running the simulations a working installation of BeamNG.research is required. Additionally, this simulation cannot be run in a Docker container but must run locally.

    To install the application use one of the following approaches:

    • Docker: docker build --tag sdc-scissor .
    • Poetry: poetry install

    Using the Tool

    The tool can be used with the following two commands:

    • Docker: docker run --volume "$(pwd)/results:/out" --rm sdc-scissor [COMMAND] [OPTIONS] (this will write all files written to /out to the local folder results)
    • Poetry: poetry run python sdc-scissor.py [COMMAND] [OPTIONS]

    There are multiple commands to use. For simplifying the documentation only the command and their options are described.

    • Generation of tests:
      • generate-tests --out-path /path/to/store/tests
    • Automated labeling of Tests:
      • label-tests --road-scenarios /path/to/tests --result-folder /path/to/store/labeled/tests
      • Note: This only works locally with BeamNG.research installed
    • Model evaluation:
      • evaluate-models --dataset /path/to/train/set --save
    • Split train and test data:
      • split-train-test-data --scenarios /path/to/scenarios --train-dir /path/for/train/data --test-dir /path/for/test/data --train-ratio 0.8
    • Test outcome prediction:
      • predict-tests --scenarios /path/to/scenarios --classifier /path/to/model.joblib
    • Evaluation based on random strategy:
      • evaluate --scenarios /path/to/test/scenarios --classifier /path/to/model.joblib

    The possible parameters are always documented with --help.

    Linting

    The tool is verified the linters flake8 and pylint. These are automatically enabled in Visual Studio Code and can be run manually with the following commands:

    poetry run flake8 .
    poetry run pylint **/*.py

    License

    The software we developed is distributed under GNU GPL license. See the LICENSE.md file.

    Contacts

    Christian Birchler - Zurich University of Applied Science (ZHAW), Switzerland - birc@zhaw.ch

    Nicolas Ganz - Zurich University of Applied Science (ZHAW), Switzerland - gann@zhaw.ch

    Sajad Khatiri - Zurich University of Applied Science (ZHAW), Switzerland - mazr@zhaw.ch

    Dr. Alessio Gambi - Passau University, Germany - alessio.gambi@uni-passau.de

    Dr. Sebastiano Panichella - Zurich University of Applied Science (ZHAW), Switzerland - panc@zhaw.ch

    References

    • Christian Birchler, Nicolas Ganz, Sajad Khatiri, Alessio Gambi, and Sebastiano Panichella. 2022. Cost-effective Simulation-based Test Selection in Self-driving Cars Software with SDC-Scissor. In 2022 IEEE 29th International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE.

    If you use this tool in your research, please cite the following papers:

    @INPROCEEDINGS{Birchler2022,
     author={Birchler, Christian and Ganz, Nicolas and Khatiri, Sajad and Gambi, Alessio, and Panichella, Sebastiano},
     booktitle={2022 IEEE 29th International Conference on Software Analysis, Evolution and Reengineering (SANER), 
     title={Cost-effective Simulationbased Test Selection in Self-driving Cars Software with SDC-Scissor}, 
     year={2022},
    }
  18. train_test_split_hortikultur_small-2

    • kaggle.com
    Updated Dec 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    De Chef (2024). train_test_split_hortikultur_small-2 [Dataset]. https://www.kaggle.com/datasets/dechef/train-test-split-hortikultur-small-2/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 31, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    De Chef
    Description

    Dataset

    This dataset was created by De Chef

    Contents

  19. Runtime of implementations on Pfam seed and full.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samantha Petti; Sean R. Eddy (2023). Runtime of implementations on Pfam seed and full. [Dataset]. http://doi.org/10.1371/journal.pcbi.1009492.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Samantha Petti; Sean R. Eddy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The runtime benchmarks were obtained by running each algorithm on the seed and full multi-MSAs Pfam-A.seed and Pfam-A.full on 2 cores with 8 GB RAM for the seed alignments and on 3 cores with 12 GB RAM for the full alignments. We did not compute the maximum runtime of the Blue algorithm; the algorithm failed to terminate within 6 days for 34 families.

  20. R

    Cifar 100 Dataset

    • universe.roboflow.com
    • opendatalab.com
    • +4more
    zip
    Updated Aug 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Popular Benchmarks (2022). Cifar 100 Dataset [Dataset]. https://universe.roboflow.com/popular-benchmarks/cifar100
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 11, 2022
    Dataset authored and provided by
    Popular Benchmarks
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Animals People CommonObjects
    Description

    CIFAR-100

    The CIFAR-10 and CIFAR-100 dataset contains labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. * More info on CIFAR-100: https://www.cs.toronto.edu/~kriz/cifar.html * TensorFlow listing of the dataset: https://www.tensorflow.org/datasets/catalog/cifar100 * GitHub repo for converting CIFAR-100 tarball files to png format: https://github.com/knjcode/cifar2png

    All images were sized 32x32 in the original dataset

    The CIFAR-10 dataset consists of 60,000 32x32 colour images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images [in the original dataset].

    This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). However, this project does not contain the superclasses. * Superclasses version: https://universe.roboflow.com/popular-benchmarks/cifar100-with-superclasses/

    More background on the dataset: https://i.imgur.com/5w8A0Vm.png" alt="CIFAR-100 Dataset Classes and Superclassees">

    Version 1 (original-images_Original-CIFAR100-Splits):

    • Original images, with the original splits for CIFAR-100: train (83.33% of images - 50,000 images) set and test (16.67% of images - 10,000 images) set only.
    • This version was not trained

    Version 2 (original-images_trainSetSplitBy80_20):

    • Original, raw images, with the train set split to provide 80% of its images to the training set (approximately 40,000 images) and 20% of its images to the validation set (approximately 10,000 images)
    • Trained from Roboflow Classification Model's ImageNet training checkpoint
    • https://blog.roboflow.com/train-test-split/ https://i.imgur.com/kSPeKGn.png" alt="Train/Valid/Test Split Rebalancing">

    Citation:

    @TECHREPORT{Krizhevsky09learningmultiple,
      author = {Alex Krizhevsky},
      title = {Learning multiple layers of features from tiny images},
      institution = {},
      year = {2009}
    }
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Film (60%/20%/20% random splits) Dataset [Dataset]. https://library.toponeai.link/dataset/film-60-20-20-random-splits

Film (60%/20%/20% random splits) Dataset

Explore at:
Description

Node classification on Film with 60%/20%/20% random splits for training/validation/test.

Search
Clear search
Close search
Google apps
Main menu