38 datasets found

Split-Dataset-12-cat-v2
kaggle.com
zip
Updated Mar 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hùng Nguyễn Việt (2024). Split-Dataset-12-cat-v2 [Dataset]. https://www.kaggle.com/datasets/hungapcs20/split-dataset-12-cat-v2/code
Explore at:
zip(788932534 bytes)Available download formats
Dataset updated
Mar 22, 2024
Authors
Hùng Nguyễn Việt
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
768MB dataset about garbage classification, divided into train, test, val folder
h
matterport3d_region_mcmc_3dgs_preprocessed
huggingface.co
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GaussianWorld (2025). matterport3d_region_mcmc_3dgs_preprocessed [Dataset]. https://huggingface.co/datasets/GaussianWorld/matterport3d_region_mcmc_3dgs_preprocessed
Explore at:
Dataset updated
Jun 12, 2025
Dataset authored and provided by
GaussianWorld
Description
Note we use the split files in splits folder to get the following splits as there are too many chunks after preprocessing: "train_grid1.0cm_chunk6x6_stride3x3_filtered", "val_grid1.0cm_chunk6x6_stride3x3_filtered", "test_grid1.0cm_chunk6x6_stride3x3_filtered".
u
Surrogate flood model comparison - Datasets and python code
figshare.unimelb.edu.au
bin
Updated Jan 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niels Fraehr (2024). Surrogate flood model comparison - Datasets and python code [Dataset]. http://doi.org/10.26188/24312658.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.26188/24312658.v1
Dataset updated
Jan 19, 2024
Dataset provided by
The University of Melbourne
Authors
Niels Fraehr
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data used for publication in "Assessment of surrogate models for flood inundation: The physics-guided LSG model vs. state-of-the-art machine learning models". Five surrogate models for flood inundation is to emulate the results of high-resolution hydrodynamic models. The surrogate models are compared based on accuracy and computational speed for three distinct case studies namely Carlisle (United Kingdom), Chowilla floodplain (Australia), and Burnett River (Australia).The dataset is structured in 5 files - "Carlisle", "Chowilla", "BurnettRV", "Comparison_results", and "Python_data". As a minimum to run the models the "Python_data" file and one of "Carlisle", "Chowilla", or "BurnettRV" are needed. We suggest to use the "Carlisle" case study for initial testing given its small size and small data requirement."Carlisle", "Chowilla", and "BurnettRV" files These files contain hydrodynamic modelling data for training and validation for each individual case study, as well as specific Python scripts for training and running the surrogate models in each case study. There are only small differences between each folder, depending on the hydrodynamic model trying to emulate and input boundary conditions (input features).Each case study file has the following folders:Geometry_data: DEM files, .npz files containing of the high-fidelity models grid (XYZ-coordinates) and areas (Same data is available for the low-fidelity model used in the LSG model), .shp files indicating location of boundaries and main flow paths (mainly used in the LSTM-SRR model). XXX_modeldata: Folder to storage trained model data for each XXX surrogate model. For example, GP_EOF_modeldata contains files used to store the trainined GP-EOF model.HD_model_data: High-fidelity (And low-fidelity) simulation results for all flood events of that case study. This folder also contains all boundary input conditions.HF_EOF_analysis: Storing of data used in the EOF analysis. EOF analysis is applied for the LSG, GP-EOF, and LSTM-EOF surrogate models. Results_data: Storing results of running the evaluation of the surrogate models.Train_test_split_data: The train-test-validation data split is the same for all surrogate models. The specific split for each cross-validation fold is stored in this folder.And Python files:YYY_event_summary, YYY_Extrap_event_summary: Files containing overview of all events, and which events are connected between the low- and high-fidelity models for each YYY case study.EOF_analysis_HFdata_preprocessing, EOF_analysis_HFdata: Preprocessing before EOF analysis and the EOF analysis of the high-fidelity data. This is used for the LSG, GP-EOF, and LSTM-EOF surrogate models.Evaluation, Evaluation_extrap: Scripts for evaluating the surrogate model for that case study and saving the results for each cross-validation fold.train_test_split: Script for splitting the flood datasets for each cross-validation fold, so all surrogate models train on the same data.XXX_training: Script for training each XXX surrogate model.XXX_preprocessing: Some surrogate models might rely on some information that needs to be generated before training. This is performed using these scripts."Comparison_results" fileFiles used for comparing surrogate models and generate the figures in the paper "Assessment of surrogate models for flood inundation: The physics-guided LSG model vs. state-of-the-art machine learning models". Figures are also included. "Python_data" fileFolder containing Python script with utility functions for setting up, training, and running the surrogate models, as well as for evaluating the surrogate models. This folder also contains a python_environment.yml file with all Python package versions and dependencies.This folder also contains two sub-folders:LSG_mods_and_func: Python scripts for using the LSG model. Some of these scripts are also utilized when working with the other surrogate models. SRR_method_master_Zhou2021: Scripts obtained from https://github.com/yuerongz/SRR-method. Small edits have for speed and use in this study.
happy-whale-and-dolphin-512-folder-split
kaggle.com
zip
Updated Apr 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tim Chang1997 (2022). happy-whale-and-dolphin-512-folder-split [Dataset]. https://www.kaggle.com/datasets/timchang1997/happy-whale-and-dolphin-512-folder-split
Explore at:
zip(1178308725 bytes)Available download formats
Dataset updated
Apr 9, 2022
Authors
Tim Chang1997
Description
Happywhale - Whale and Dolphin Identification Competition dataset

pic_size = 512*512

split label in folder
patent-time-split-supplementary-folder
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mizuno-group, patent-time-split-supplementary-folder [Dataset]. https://huggingface.co/datasets/mizuno-group/patent-time-split-supplementary-folder
Explore at:
Dataset provided by
Mizunohttp://mizuno.com/
Authors
mizuno-group
Description
mizuno-group/patent-time-split-supplementary-folder dataset hosted on Hugging Face and contributed by the HF Datasets community
h
tradingIdeas
huggingface.co
Updated Jan 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Singh (2025). tradingIdeas [Dataset]. https://huggingface.co/datasets/DiljitSingh14/tradingIdeas
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 2, 2025
Authors
Singh
Description
TradingView Ideas Dataset

This dataset contains trading ideas and analysis sourced from TradingView, split into training and testing datasets for machine learning purposes. It includes both image data (chart screenshots) and associated textual descriptions.

Dataset Structure Root Folder Contents

train.zip: Compressed folder containing training data (images and JSON split). test.zip: Compressed folder containing testing data (images and JSON split).… See the full description on the dataset page: https://huggingface.co/datasets/DiljitSingh14/tradingIdeas.
d
Data from: imageseg: An R package for deep learning-based image segmentation...
datadryad.org
data.niaid.nih.gov
zip
Updated Aug 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jürgen Niedballa; Jan Axtner; Timm Döbert; Andrew Tilker; An Nguyen; Seth Wong; Christian Fiderer; Marco Heurich; Andreas Wilting (2022). imageseg: An R package for deep learning-based image segmentation [Dataset]. http://doi.org/10.5061/dryad.x0k6djhnj
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.x0k6djhnj
Dataset updated
Aug 6, 2022
Dataset provided by
Dryad
Authors
Jürgen Niedballa; Jan Axtner; Timm Döbert; Andrew Tilker; An Nguyen; Seth Wong; Christian Fiderer; Marco Heurich; Andreas Wilting
Time period covered
Jul 19, 2022
Description
Convolutional neural networks (CNNs) and deep learning are powerful and robust tools for ecological applications, and are particularly suited for image data. Image segmentation (the classification of all pixels in images) is one such application and can for example be used to assess forest structural metrics. While CNN-based image segmentation methods for such applications have been suggested, widespread adoption in ecological research has been slow, likely due to technical difficulties in implementation of CNNs and lack of toolboxes for ecologists.

Here, we present R package imageseg which implements a CNN-based workflow for general-purpose image segmentation using the U-Net and U-Net++ architectures in R. The workflow covers data (pre)processing, model training, and predictions. We illustrate the utility of the package with image recognition models for two forest structural metrics: tree canopy density and understory vegetation density. We trained the models using large and dive...
Z
Data from: Solar flare forecasting based on magnetogram sequences learning...
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grim, Luís Fernando Lopes; Sampaio Gradvohl, André Leon (2023). Solar flare forecasting based on magnetogram sequences learning with MViT and data augmentation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10246576
Explore at:
Dataset updated
Dec 4, 2023
Dataset provided by
Universidade Estadual de Campinas (UNICAMP)
Universidade Estadual de Campinas
Authors
Grim, Luís Fernando Lopes; Sampaio Gradvohl, André Leon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Source codes and dataset of the research "Solar flare forecasting based on magnetogram sequences learning with MViT and data augmentation". Our work employed PyTorch, a framework for training Deep Learning models with GPU support and automatic back-propagation, to load the MViTv2 s models with Kinetics-400 weights. To simplify the code implementation, eliminating the need for an explicit loop to train and the automation of some hyperparameters, we use the PyTorch Lightning module. The inputs were batches of 10 samples with 16 sequenced images in 3-channel resized to 224 × 224 pixels and normalized from 0 to 1. Most of the papers in our literature survey split the original dataset chronologically. Some authors also apply k-fold cross-validation to emphasize the evaluation of the model stability. However, we adopt a hybrid split taking the first 50,000 to apply the 5-fold cross-validation between the training and validation sets (known data), with 40,000 samples for training and 10,000 for validation. Thus, we can evaluate performance and stability by analyzing the mean and standard deviation of all trained models in the test set, composed of the last 9,834 samples, preserving the chronological order (simulating unknown data). We develop three distinct models to evaluate the impact of oversampling magnetogram sequences through the dataset. The first model, Solar Flare MViT (SF MViT), has trained only with the original data from our base dataset without using oversampling. In the second model, Solar Flare MViT over Train (SF MViT oT), we only apply oversampling on training data, maintaining the original validation dataset. In the third model, Solar Flare MViT over Train and Validation (SF MViT oTV), we apply oversampling in both training and validation sets. We also trained a model oversampling the entire dataset. We called it the "SF_MViT_oTV Test" to verify how resampling or adopting a test set with unreal data may bias the results positively. GitHub version The .zip hosted here contains all files from the project, including the checkpoint and the output files generated by the codes. We have a clean version hosted on GitHub (https://github.com/lfgrim/SFF_MagSeq_MViTs), without the magnetogram_jpg folder (which can be downloaded directly on https://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531804/dataset_ss2sff.zip) and the output and checkpoint files. Most code files hosted here also contain comments on the Portuguese language, which are being updated to English in the GitHub version. Folders Structure In the Root directory of the project, we have two folders:

magnetogram_jpg: holds the source images provided by Space Environment Artificial Intelligence Early Warning Innovation Workshop through the link https://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531804/dataset_ss2sff.zip. It comprises 73,810 samples of high-quality magnetograms captured by HMI/SDO from 2010 May 4 to 2019 January 26. The HMI instrument provides these data (stored in hmi.sharp_720s dataset), making new samples available every 12 minutes. However, the images from this dataset were collected every 96 minutes. Each image has an associated magnetogram comprising a ready-made snippet of one or most solar ARs. It is essential to notice that the magnetograms cropped by SHARP can contain one or more solar ARs classified by the National Oceanic and Atmospheric Administration (NOAA). Seq_Magnetogram: contains the references for source images with the corresponding labels in the next 24 h. and 48 h. in the respectively M24 and M48 sub-folders.

M24/M48: both present the following sub-folders structure:

Seqs16; SF_MViT; SF_MViT_oT; SF_MViT_oTV; SF_MViT_oTV_Test. There are also two files in root:

inst_packages.sh: install the packages and dependencies to run the models. download_MViTS.py: download the pre-trained MViTv2_S from PyTorch and store it in the cache. M24 and M48 folders hold reference text files (flare_Mclass...) linking the images in the magnetogram_jpg folders or the sequences (Seq16_flare_Mclass...) in the Seqs16 folders with their respective labels. They also hold "cria_seqs.py" which was responsible for creating the sequences and "test_pandas.py" to verify head info and check the number of samples categorized by the label of the text files. All the text files with the prefix "Seq16" and inside the Seqs16 folder were created by "criaseqs.py" code based on the correspondent "flare_Mclass" prefixed text files. Seqs16 folder holds reference text files, in which each file contains a sequence of images that was pointed to the magnetogram_jpg folders. All SF_MViT... folders hold the model training codes itself (SF_MViT...py) and the corresponding job submission (jobMViT...), temporary input (Seq16_flare...), output (saida_MVIT... and MViT_S...), error (err_MViT...) and checkpoint files (sample-FLARE...ckpt). Executed model training codes generate output, error, and checkpoint files. There is also a folder called "lightning_logs" that stores logs of trained models. Naming pattern for the files:

magnetogram_jpg: follows the format "hmi.sharp_720s...magnetogram.fits.jpg" and Seqs16: follows the format "hmi.sharp_720s...to.", where:

hmi: is the instrument that captured the image

sharp_720s: is the database source of SDO/HMI.

is the identification of SHARP region, and can contain one or more solar ARs classified by the (NOAA).

is the date-time the instrument captured the image in the format yyyymmdd_hhnnss_TAI (y:year, m:month, d:day, h:hours, n:minutes, s:seconds).

is the date-time when the sequence starts, and follow the same format of .

is the date-time when the sequence ends, and follow the same format of . Reference text files in M24 and M48 or inside SF_MViT... folders follows the format "flare_Mclass_.txt", where:

is Seq16 if refers to a sequence, or void if refers direct to images.

"24h" or "48h".

is "TrainVal" or "Test". The refers to the split of Train/Val.

void or "_over" after the extension (...txt_over): means temporary input reference that was over-sampled by a training model. All SF_MViT...folders:

Model training codes: "SF_MViT_M+_", where:

void or "oT" (over Train) or "oTV" (over Train and Val) or "oTV_Test" (over Train, Val and Test);

"24h" or "48h";

"oneSplit" for a specific split or "allSplits" if run all splits.

void is default to run 1 GPU or "2gpu" to run into 2 gpus systems; Job submission files: "jobMViT_", where:

point the queue in Lovelace environment hosted on CENAPAD-SP (https://www.cenapad.unicamp.br/parque/jobsLovelace) Temporary inputs: "Seq16_flare_Mclass_.txt:

train or val;

void or "_over" after the extension (...txt_over): means temporary input reference that was over-sampled by a training model. Outputs: "saida_MViT_Adam_10-7", where:

k0 to k4, means the correlated split of the output, or void if the output is from all splits. Error files: "err_MViT_Adam_10-7", where:

k0 to k4, means the correlated split of the error log file, or void if the error file is from all splits. Checkpoint files: "sample-FLARE_MViT_S_10-7-epoch=-valid_loss=-Wloss_k=.ckpt", where:

epoch number of the checkpoint;

corresponding valid loss;

0 to 4.
[Dataset] Towards Robotic Mapping of a Honeybee Comb
data.europa.eu
unknown
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo, [Dataset] Towards Robotic Mapping of a Honeybee Comb [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-15042164?locale=hu
Explore at:
unknown(4855)Available download formats
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
"Towards Robotic Mapping of a Honeybee Comb" Dataset This dataset supports the analyses and experiments of the paper: J. Janota et al., "Towards Robotic Mapping of a Honeybee Comb," 2024 International Conference on Manipulation, Automation and Robotics at Small Scales (MARSS), Delft, Netherlands, 2024, doi: 10.1109/MARSS61851.2024.10612712. Link to Paper | Link to Code Repository Cell Detection The celldet_2023 dataset contains a total of 260 images of the honeycomb (at resolution 67 µm per pixel), with masks from the ViT-H Segment Anything Model (SAM) and annotations for these masks. The structure of the dataset is following:celldet_2023├── {image_name}.png├── ...├── masksH (folder with masks for each image)├────{image_name}.json├────...├── annotations├────annotated_masksH (folder with annotations for training images)├──────{image_name in training part}.csv├──────...├────annotated_masksH_val (folder with annotations for validation images)├──────{image_name in validation part}.csv}├──────...├────annotated_masksH_test (folder with annotations for test images)├──────{image_name in test part}.csv}├──────... Masks For each image there is a .json file that contains all the masks produced by the SAM for the particular image, the masks are in COCO Run-Length Encoding (RLE) format. Annotations The annotation files are split into folders based on whether they were used for training, validation or testing. For each image (and thus also for each .json file with masks), there is a .csv file with two columns: Column id Description 0 order id of the mask in the corresponding .json file 1 mask label: 1 if fully visible cell, 2 if partially occluded cell, 0 otherwise Loading the Dataset For an example of loading the data, see the data loader in the paper repository: python cell_datasetV2.py --img_dir --mask_dir
E
ProfNER corpus: gold standard annotations for profession detection in...
live.european-language-grid.eu
data.niaid.nih.gov
txt
Updated Nov 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). ProfNER corpus: gold standard annotations for profession detection in Spanish COVID-19 tweets [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7872
Explore at:
txtAvailable download formats
Dataset updated
Nov 29, 2022
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Gold Standard annotations for SMM4H-Spanish shared task and unannotated test and background files. SMM4H 2021 accepted at NAACL (scheduled in Mexico City in June) https://2021.naacl.org/. Introduction: The entire corpus contains 10,000 annotated tweets. It has been split into training, validation and test (60-20-20). The current version contains the training and development set of the shared task with Gold Standard annotations.In addition, it contains the unannotated test and background sets will be released. Participants must submit predictions for the files under the directory "test-background-txt-files"For subtask-1 (classification), annotations are distributed in a tab-separated file (TSV). The TSV format follows the format employed in SMM4H 2019 Task 2:tweet_id class For subtask-2 (Named Entity Recognition, profession detection), annotations are distributed in 2 formats: Brat standoff and TSV. See the Brat webpage for more information about the Brat standoff format (https://brat.nlplab.org/standoff.html). The TSV format follows the format employed in SMM4H 2019 Task 2:tweet_id begin end type extraction. In addition, we provide a tokenized version of the dataset, for participant's convenience. It follows the BIO format (similar to CONLL). The files were generated with the brat_to_conll.py script (included), which employs the es_core_news_sm-2.3.1 Spacy model for tokenization.Zip structure:subtask-1: files of tweet classification subtask. Content: One TSV file per corpus split (train and valid).train-valid-txt-files: folder with training and validation text files. One text file per tweet. One sub-directory per corpus split (train and valid). train-valid-txt-files-english: folder with training and validation text files Machine Translated to English.test-background-txt-files: folder with the test and background text files. You must make your predictions for these files and upload them to CodaLab.subtask-2: files of Named Entity Recognition subtask. Content:brat: folder with annotations in Brat format. One sub-directory per corpus split (train and valid)TSV: folder with annotations in TSV. One file per corpus split (train and valid)BIO: folder with corpus in BIO tagging. One file per corpus split (train and valid)train-valid-txt-files: folder with training and validation text files. One text file per tweet. One sub-directory per corpus split (train and valid) train-valid-txt-files-english: folder with training and validation text files Machine Translated to English. test-background-txt-files: folder with the test and background text files. You must make your predictions for these files and upload them to CodaLab.Annotation quality: We have performed a consistency analysis of the corpus. 10% of the documents have been annotated by an internal annotator as well as by the linguist experts following the same annotation guidelines. The preliminary Inter-Annotator Agreement (pairwise agreement) is 0.919. Important shared task information: SYSTEM PREDICTIONS MUST FOLLOW THE TSV FORMAT. And systems will only be evaluated for the PROFESION and SITUACION_LABORAL predictions (despite the Gold Standard contains 2 extra entity classes). For more information about the evaluation scenario, see the Codalab link, or the evaluation webpage. For further information, please visit https://temu.bsc.es/smm4h-spanish/ or email us at encargo-pln-life@bsc.es Resources:WebAnnotation guidelines (in Spanish) Annotation guidelines (in English) FastText COVID-19 Twitter embeddings Occupations gazetteer
h
coupled-msd
huggingface.co
Updated May 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
daniel frank (2025). coupled-msd [Dataset]. https://huggingface.co/datasets/dany-l-23/coupled-msd
Explore at:
Dataset updated
May 28, 2025
Authors
daniel frank
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Overview This dataset contains input-output data of a coupled mass-spring-damper system with a nonlinear force profile. The data was generated with statesim [1], a python package for simulating linear and nonlinear ODEs, for the system coupled-msd. The configuration .json files for the corresponding datasets (in-distribution and out-of-distribution) can be found in the respective folders. After creating the dataset, the files are stored in the raw folder. Then, they are split into subsets for… See the full description on the dataset page: https://huggingface.co/datasets/dany-l-23/coupled-msd.
Adaptive Parameter Control for Search-Based Unit Test Generation —...
zenodo.org
zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Henrik Johansson; Erik Blomberg; Henrik Johansson; Erik Blomberg (2025). Adaptive Parameter Control for Search-Based Unit Test Generation — Replication Package [Dataset]. http://doi.org/10.5281/zenodo.12187544
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12187544
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Henrik Johansson; Erik Blomberg; Henrik Johansson; Erik Blomberg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Running the experiments

Prerequisites for running the experiments:

Docker

Poetry

Steps to run the experiments:

Download the experiment zip file you wish to run (single_parameter_experiment.zip or multi_parameter_experiment.zip).

Un-zip the file.

Open a terminal and navigate to the unzipped folder (e.g. cd single_parameter_experiment).

Run poetry install --only main to install all dependencies.

To run the experiment, run poetry run python run_experiment.py.

All results can be found in the folder data/.

The modules used for the experiment are defined in the file experiment_modules.py and to see the experiment configuration, look in run_experiment.py.

Warning: the experiments take several weeks to run on a single machine, therefore it is advisable to split the experiments based on modules and run them in parallel.

Running the analysis

Prerequisites for running the analysis:

Conda

Steps to run the analysis:

Download the analysis zip file (analysis-adaptive-parameter-control.zip).

Un-zip the file.

Open a terminal and navigate to the unzipped folder (e.g. cd analysis-adaptive-parameter-control).

Run the following command to install the conda environment and all dependencies: conda env create -f environment.yml

If you want to re-run the Bayesian models locally on your machine, follow the optional step below, otherwise download and unzip the trace data from the replication package, i.e., Trace data single.zip and Trace data multi.zip.

Place the .nc files in the corresponding folder: analysis-adaptive-parameter-control/single_parameter/ or analysis-adaptive-parameter-control/multi_parameter/.

E.g. the coverage_rate_model_single_parameter.nc goes in the single_parameter folder, while the coverage_rate_model_multi_parameter.nc goes in the multi_parameter folder.

Navigate to the notebooks folder (Notebooks/).

Open a notebook of choice (coverage_rate_multi_parameter.ipynb, coverage_rate_single_parameter.ipynb, final_coverage_multi_parameter.ipynb, final_coverage_single_parameter.ipynb, overhead_model_multi_parameter.ipynb, or overhead_model_single_parameter.ipynb).

Navigate to the section called "Data analysis" and run all cells in order.

(Optional) Running the Bayesian models locally before the analysis.

Navigate to the notebooks folder (Notebooks/).

Open a notebook of choice (coverage_rate_multi_parameter.ipynb, coverage_rate_single_parameter.ipynb, final_coverage_multi_parameter.ipynb, final_coverage_single_parameter.ipynb, overhead_model_multi_parameter.ipynb, or overhead_model_single_parameter.ipynb).

Navigate to the section called "Model specification" and run the three notebook cells.

Warning: this will take a long time, if you don't have the time, use the following alternative instead

Data

The data from when we ran the experiments is available in the Single data.zip and Multi data.zip files.

The structure of these are the following:

There are folders for each module the experiment was run on, further divided into each unique run. All these folders include:

Coverage reports.

Complete logs for the unique run.

A timeline over controlled parameter values during the test generation process.

The complete Pynguin configuration for the run.

The generated test suite.

There is one statistics.csv file containing some information about each run and their branch coverage timelines.

Running the parameter assignment analysis

Prerequisites for running the parameter assignment analysis:

Conda

Steps to run the parameter assignment analysis:

Download the analysis zip file (parameter-assignment.zip).

Un-zip the file.

Open a terminal and navigate to the unzipped folder (e.g. cd parameter-assignment).

Run the following command to install the conda environment and all dependencies: conda env create -f environment.yml

Navigate to the notebooks folder (Notebooks/).

Open the notebook parameter_assignment_analysis.ipynb.

Run all cells in order.
Oxford Flowers-17 Restructured
kaggle.com
zip
Updated Jun 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aris Ilias Goutis (2025). Oxford Flowers-17 Restructured [Dataset]. https://www.kaggle.com/datasets/arisiliasgoutis/oxford-flowers-17-restructured
Explore at:
zip(60547590 bytes)Available download formats
Dataset updated
Jun 13, 2025
Authors
Aris Ilias Goutis
Description
Dataset is from https://www.robots.ox.ac.uk/~vgg/data/flowers/17/. It contains images of flowers from 17 different species. 80 images per class/species are contained in the dataset for a total of 1360 images. A test split of 25% has been applied, resulting to 20 image per class for test set & 60 image per class for train & validation set. Then, restructured to folder per class format, and folder for test vs training & validation split.
I
dataset_classification
app.ikomia.ai
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ikomia (2023). dataset_classification [Dataset]. https://app.ikomia.ai/hub/algorithms/dataset_classification/
Explore at:
Dataset updated
Dec 20, 2023
Dataset authored and provided by
Ikomia
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Load classification dataset This algorithm allows to load a classification dataset from a given folder. It can also split the dataset into train and validation folders....
h
ESC50
huggingface.co
Updated Sep 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maha Tufail Agro (2024). ESC50 [Dataset]. https://huggingface.co/datasets/MahiA/ESC50
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 28, 2024
Authors
Maha Tufail Agro
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
ESC50

This is an audio classification dataset for Environmental Sound Classification. Classes = 50 , Split = Train-Test

Structure

audios folder contains audio files. csv_files folder contains CSV files for five-fold cross-validation. To perform cross-validation on fold 1, train_1.csv will be used for the training split and test_1.csv for the testing split, with the same pattern followed for the other folds. To perform training and testing witout cross-validation… See the full description on the dataset page: https://huggingface.co/datasets/MahiA/ESC50.
h
AnyInstruct-resolution-1024
huggingface.co
Updated Aug 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenMOSS (2024). AnyInstruct-resolution-1024 [Dataset]. https://huggingface.co/datasets/OpenMOSS-Team/AnyInstruct-resolution-1024
Explore at:
Dataset updated
Aug 11, 2024
Dataset authored and provided by
OpenMOSS
Description
File Restoration and Extraction Guide

File Structure

Root directory: Contains Part 1 split files part2/ directory: Contains Part 2 split files

Instructions Step 1: File Restoration

Due to size limitations, the original file has been split. To restore the complete file: cat images_1024.part_* > images_1024.tar

Step 2: Extraction

To extract the contents: tar -xvf images_1024.tar

Important Notes

For Part 1 images: Execute… See the full description on the dataset page: https://huggingface.co/datasets/OpenMOSS-Team/AnyInstruct-resolution-1024.
m
Permeability Prediction in 2D: Dataset and Trained Convolutional Neural...
data.mendeley.com
Updated Sep 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andre Adam (2025). Permeability Prediction in 2D: Dataset and Trained Convolutional Neural Networks [Dataset]. http://doi.org/10.17632/576dvrrsdx.2
Explore at:
Unique identifier
https://doi.org/10.17632/576dvrrsdx.2
Dataset updated
Sep 23, 2025
Authors
Andre Adam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset was generated for the purpose of training convolutional neural network (CNN) models for permeability prediction of 2D structures. The whole dataset is part of a study on predicting permeability using CNNs, while addressing discussions that are largely absent from the current literature, such as the effect of data diversity in the accuracy, input pre-processing, error estimation, architecture comparisons, and sources of error. A link to the publication, which includes a lot more detail about the dataset and CNN models, will be added once it is published.

The data included in this dataset is split into three different folders. The data under the "Training Data" folder includes 4,500 images, divided in 15 sub-folders. Each sub-folder contains 901 files, which are 300 images, a pressure-velocity map for each of the 300 structures, convergence data for each individual structure, and one comma-separated file (csv) summarizing all simulation results in the folder. The pressure and velocity maps together with the convergence information are direct results of the CFD algorithm used, but the important information for training the CNN models are the images and the permeability data in the csv files.

The "Trained CNNs" folder contains all of the trained CNN models as described in the linked publication for predicting permeability. That includes the ensemble of VGG19 networks. The "External Test Set" includes the same type of data as the "Training Data" folder, but this section of data was only used to test the CNN models. In other words, the trained CNN models never saw any of this data in training, only in testing. The "External Test Set" folder also includes data for phase-size distributions and surface area for all data in this repository. For more details on those, refer to the publication.

The CFD code and the image generation code can be found in the following GitHub, along with more extensive documentation: https://github.com/adama-wzr/PixelBasedPermeability/
f
S1 Videos zip segment -
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jan 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liebal, Katja; Burrows, Anne; Miyabe-Nishiwaki, Takako; Richardson, Jack L.; Waller, Bridget; Hayashi, Misato; Correia-Caeiro, Catia; Costa, Raquel; Robbins, Martha M.; Pater, Jordan (2025). S1 Videos zip segment - [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001498557
Explore at:
Dataset updated
Jan 28, 2025
Authors
Liebal, Katja; Burrows, Anne; Miyabe-Nishiwaki, Takako; Richardson, Jack L.; Waller, Bridget; Hayashi, Misato; Correia-Caeiro, Catia; Costa, Raquel; Robbins, Martha M.; Pater, Jordan
Description
Each Videos ZIP Segment file is a segment of a split ZIP. To access the videos in the split ZIP, download all Videos ZIP Segment files that are part of the split ZIP to a single folder (including the Videos ZIP Target file that has the full “.zip” extension), rename all of the downloaded files to have the same root filename (e.g., rename “pone.0308790.s001.z01” to “Videos.z01”, rename “pone.0308790.s002.z02” to “Videos.z02”, rename “pone.0308790.s003.z03” to “Videos.z03”, etc.), then open the file with the “.zip” extension (e.g., “Videos.zip”), navigate the folders within the ZIP and select a video to open. Alternatively, after downloading all ZIP and ZXX files into a folder, unzip all files by opening the .ZIP file with a compressing software such as WinRAR, WinZip, or 7-Zip. This will give access to the 177 GorillaFACS video examples organized in their respective folders for each AU and AD. https://doi.org/10.1371/journal.pone.0308790.s001 (Z01)
COCO annotated Dataset Car Damage Detection
kaggle.com
zip
Updated Nov 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ramsi Kalia (2021). COCO annotated Dataset Car Damage Detection [Dataset]. https://www.kaggle.com/ramsikalia/coco-annotated-dataset-car-damage-detection
Explore at:
zip(134878631 bytes)Available download formats
Dataset updated
Nov 22, 2021
Authors
Ramsi Kalia
Description
Dataset

This dataset was created by Ramsi Kalia

Contents
Caltech-256: Pre-Processed 80/20 Train-Test Split
kaggle.com
zip
Updated Nov 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KUSHAGRA MATHUR (2025). Caltech-256: Pre-Processed 80/20 Train-Test Split [Dataset]. https://www.kaggle.com/datasets/kushubhai/caltech-256-train-test
Explore at:
zip(1138799273 bytes)Available download formats
Dataset updated
Nov 12, 2025
Authors
KUSHAGRA MATHUR
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Context The Caltech-256 dataset is a foundational benchmark for object recognition, containing 30,607 images across 257 categories (256 object categories + 1 clutter category).

The original dataset is typically provided as a collection of directories, one for each category. This version streamlines the machine learning workflow by providing:

A clean, pre-defined 80/20 train-test split.

Manifest files (train.csv, test.csv) that map image paths directly to their labels, allowing for easy use with data generators in frameworks like PyTorch and TensorFlow.

A flat directory structure (train/, test/) for simplified file access.

File Content The dataset is organized into a single top-level folder and two CSV files:

train.csv: A CSV file containing two columns: image_path and label. This file lists all images designated for the training set.

test.csv: A CSV file with the same structure as train.csv, listing all images designated for the testing set.

Caltech-256_Train_Test/: The primary data folder.

train/: This directory contains 80% of the images from all 257 categories, intended for model training.

test/: This directory contains the remaining 20% of the images from all categories, reserved for model evaluation.

Data Split The dataset has been thoroughly partitioned to create a standard 80% training and 20% testing split. This split is (or should be assumed to be) stratified, meaning that each of the 257 object categories is represented in roughly an 80/20 proportion in the respective sets.

Acknowledgements & Original Source This dataset is a derivative work created for convenience. The original data and images belong to the authors of the Caltech-256 dataset.

Original Dataset Link: https://www.kaggle.com/datasets/jessicali9530/caltech256/data

Citation: Griffin, G. Holub, A.D. Perona, P. (2007). Caltech-256 Object Category Dataset. California Institute of Technology.

Facebook

Twitter

Click to copy link

Link copied

Cite

Hùng Nguyễn Việt (2024). Split-Dataset-12-cat-v2 [Dataset]. https://www.kaggle.com/datasets/hungapcs20/split-dataset-12-cat-v2/code

Split-Dataset-12-cat-v2

Explore at:

zip(788932534 bytes)Available download formats

Dataset updated

Mar 22, 2024

Authors

Hùng Nguyễn Việt

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

768MB dataset about garbage classification, divided into train, test, val folder

Clear search

Close search

Google apps

Main menu

Split-Dataset-12-cat-v2

matterport3d_region_mcmc_3dgs_preprocessed

Surrogate flood model comparison - Datasets and python code

happy-whale-and-dolphin-512-folder-split

Happywhale - Whale and Dolphin Identification Competition dataset

patent-time-split-supplementary-folder

tradingIdeas

Data from: imageseg: An R package for deep learning-based image segmentation...

Data from: Solar flare forecasting based on magnetogram sequences learning...

[Dataset] Towards Robotic Mapping of a Honeybee Comb

ProfNER corpus: gold standard annotations for profession detection in...

coupled-msd

Adaptive Parameter Control for Search-Based Unit Test Generation —...

Running the experiments

Running the analysis

(Optional) Running the Bayesian models locally before the analysis.

Data

Running the parameter assignment analysis

Oxford Flowers-17 Restructured

dataset_classification

ESC50

AnyInstruct-resolution-1024

Permeability Prediction in 2D: Dataset and Trained Convolutional Neural...

S1 Videos zip segment -

COCO annotated Dataset Car Damage Detection

Dataset

Contents

Caltech-256: Pre-Processed 80/20 Train-Test Split

Split-Dataset-12-cat-v2