38 datasets found
  1. Split-Dataset-12-cat-v2

    • kaggle.com
    zip
    Updated Mar 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hùng Nguyễn Việt (2024). Split-Dataset-12-cat-v2 [Dataset]. https://www.kaggle.com/datasets/hungapcs20/split-dataset-12-cat-v2/code
    Explore at:
    zip(788932534 bytes)Available download formats
    Dataset updated
    Mar 22, 2024
    Authors
    Hùng Nguyễn Việt
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    768MB dataset about garbage classification, divided into train, test, val folder

  2. h

    matterport3d_region_mcmc_3dgs_preprocessed

    • huggingface.co
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GaussianWorld (2025). matterport3d_region_mcmc_3dgs_preprocessed [Dataset]. https://huggingface.co/datasets/GaussianWorld/matterport3d_region_mcmc_3dgs_preprocessed
    Explore at:
    Dataset updated
    Jun 12, 2025
    Dataset authored and provided by
    GaussianWorld
    Description

    Note we use the split files in splits folder to get the following splits as there are too many chunks after preprocessing: "train_grid1.0cm_chunk6x6_stride3x3_filtered", "val_grid1.0cm_chunk6x6_stride3x3_filtered", "test_grid1.0cm_chunk6x6_stride3x3_filtered".

  3. u

    Surrogate flood model comparison - Datasets and python code

    • figshare.unimelb.edu.au
    bin
    Updated Jan 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niels Fraehr (2024). Surrogate flood model comparison - Datasets and python code [Dataset]. http://doi.org/10.26188/24312658.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 19, 2024
    Dataset provided by
    The University of Melbourne
    Authors
    Niels Fraehr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data used for publication in "Assessment of surrogate models for flood inundation: The physics-guided LSG model vs. state-of-the-art machine learning models". Five surrogate models for flood inundation is to emulate the results of high-resolution hydrodynamic models. The surrogate models are compared based on accuracy and computational speed for three distinct case studies namely Carlisle (United Kingdom), Chowilla floodplain (Australia), and Burnett River (Australia).The dataset is structured in 5 files - "Carlisle", "Chowilla", "BurnettRV", "Comparison_results", and "Python_data". As a minimum to run the models the "Python_data" file and one of "Carlisle", "Chowilla", or "BurnettRV" are needed. We suggest to use the "Carlisle" case study for initial testing given its small size and small data requirement."Carlisle", "Chowilla", and "BurnettRV" files These files contain hydrodynamic modelling data for training and validation for each individual case study, as well as specific Python scripts for training and running the surrogate models in each case study. There are only small differences between each folder, depending on the hydrodynamic model trying to emulate and input boundary conditions (input features).Each case study file has the following folders:Geometry_data: DEM files, .npz files containing of the high-fidelity models grid (XYZ-coordinates) and areas (Same data is available for the low-fidelity model used in the LSG model), .shp files indicating location of boundaries and main flow paths (mainly used in the LSTM-SRR model). XXX_modeldata: Folder to storage trained model data for each XXX surrogate model. For example, GP_EOF_modeldata contains files used to store the trainined GP-EOF model.HD_model_data: High-fidelity (And low-fidelity) simulation results for all flood events of that case study. This folder also contains all boundary input conditions.HF_EOF_analysis: Storing of data used in the EOF analysis. EOF analysis is applied for the LSG, GP-EOF, and LSTM-EOF surrogate models. Results_data: Storing results of running the evaluation of the surrogate models.Train_test_split_data: The train-test-validation data split is the same for all surrogate models. The specific split for each cross-validation fold is stored in this folder.And Python files:YYY_event_summary, YYY_Extrap_event_summary: Files containing overview of all events, and which events are connected between the low- and high-fidelity models for each YYY case study.EOF_analysis_HFdata_preprocessing, EOF_analysis_HFdata: Preprocessing before EOF analysis and the EOF analysis of the high-fidelity data. This is used for the LSG, GP-EOF, and LSTM-EOF surrogate models.Evaluation, Evaluation_extrap: Scripts for evaluating the surrogate model for that case study and saving the results for each cross-validation fold.train_test_split: Script for splitting the flood datasets for each cross-validation fold, so all surrogate models train on the same data.XXX_training: Script for training each XXX surrogate model.XXX_preprocessing: Some surrogate models might rely on some information that needs to be generated before training. This is performed using these scripts."Comparison_results" fileFiles used for comparing surrogate models and generate the figures in the paper "Assessment of surrogate models for flood inundation: The physics-guided LSG model vs. state-of-the-art machine learning models". Figures are also included. "Python_data" fileFolder containing Python script with utility functions for setting up, training, and running the surrogate models, as well as for evaluating the surrogate models. This folder also contains a python_environment.yml file with all Python package versions and dependencies.This folder also contains two sub-folders:LSG_mods_and_func: Python scripts for using the LSG model. Some of these scripts are also utilized when working with the other surrogate models. SRR_method_master_Zhou2021: Scripts obtained from https://github.com/yuerongz/SRR-method. Small edits have for speed and use in this study.

  4. happy-whale-and-dolphin-512-folder-split

    • kaggle.com
    zip
    Updated Apr 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tim Chang1997 (2022). happy-whale-and-dolphin-512-folder-split [Dataset]. https://www.kaggle.com/datasets/timchang1997/happy-whale-and-dolphin-512-folder-split
    Explore at:
    zip(1178308725 bytes)Available download formats
    Dataset updated
    Apr 9, 2022
    Authors
    Tim Chang1997
    Description

    Happywhale - Whale and Dolphin Identification Competition dataset

    • pic_size = 512*512
    • split label in folder
  5. patent-time-split-supplementary-folder

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mizuno-group, patent-time-split-supplementary-folder [Dataset]. https://huggingface.co/datasets/mizuno-group/patent-time-split-supplementary-folder
    Explore at:
    Dataset provided by
    Mizunohttp://mizuno.com/
    Authors
    mizuno-group
    Description

    mizuno-group/patent-time-split-supplementary-folder dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    tradingIdeas

    • huggingface.co
    Updated Jan 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Singh (2025). tradingIdeas [Dataset]. https://huggingface.co/datasets/DiljitSingh14/tradingIdeas
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 2, 2025
    Authors
    Singh
    Description

    TradingView Ideas Dataset

    This dataset contains trading ideas and analysis sourced from TradingView, split into training and testing datasets for machine learning purposes. It includes both image data (chart screenshots) and associated textual descriptions.

      Dataset Structure
    
    
    
    
    
      Root Folder Contents
    

    train.zip: Compressed folder containing training data (images and JSON split). test.zip: Compressed folder containing testing data (images and JSON split).… See the full description on the dataset page: https://huggingface.co/datasets/DiljitSingh14/tradingIdeas.

  7. d

    Data from: imageseg: An R package for deep learning-based image segmentation...

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Aug 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jürgen Niedballa; Jan Axtner; Timm Döbert; Andrew Tilker; An Nguyen; Seth Wong; Christian Fiderer; Marco Heurich; Andreas Wilting (2022). imageseg: An R package for deep learning-based image segmentation [Dataset]. http://doi.org/10.5061/dryad.x0k6djhnj
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 6, 2022
    Dataset provided by
    Dryad
    Authors
    Jürgen Niedballa; Jan Axtner; Timm Döbert; Andrew Tilker; An Nguyen; Seth Wong; Christian Fiderer; Marco Heurich; Andreas Wilting
    Time period covered
    Jul 19, 2022
    Description
    1. Convolutional neural networks (CNNs) and deep learning are powerful and robust tools for ecological applications, and are particularly suited for image data. Image segmentation (the classification of all pixels in images) is one such application and can for example be used to assess forest structural metrics. While CNN-based image segmentation methods for such applications have been suggested, widespread adoption in ecological research has been slow, likely due to technical difficulties in implementation of CNNs and lack of toolboxes for ecologists.
    2. Here, we present R package imageseg which implements a CNN-based workflow for general-purpose image segmentation using the U-Net and U-Net++ architectures in R. The workflow covers data (pre)processing, model training, and predictions. We illustrate the utility of the package with image recognition models for two forest structural metrics: tree canopy density and understory vegetation density. We trained the models using large and dive...
  8. Z

    Data from: Solar flare forecasting based on magnetogram sequences learning...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grim, Luís Fernando Lopes; Sampaio Gradvohl, André Leon (2023). Solar flare forecasting based on magnetogram sequences learning with MViT and data augmentation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10246576
    Explore at:
    Dataset updated
    Dec 4, 2023
    Dataset provided by
    Universidade Estadual de Campinas (UNICAMP)
    Universidade Estadual de Campinas
    Authors
    Grim, Luís Fernando Lopes; Sampaio Gradvohl, André Leon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Source codes and dataset of the research "Solar flare forecasting based on magnetogram sequences learning with MViT and data augmentation". Our work employed PyTorch, a framework for training Deep Learning models with GPU support and automatic back-propagation, to load the MViTv2 s models with Kinetics-400 weights. To simplify the code implementation, eliminating the need for an explicit loop to train and the automation of some hyperparameters, we use the PyTorch Lightning module. The inputs were batches of 10 samples with 16 sequenced images in 3-channel resized to 224 Ă— 224 pixels and normalized from 0 to 1. Most of the papers in our literature survey split the original dataset chronologically. Some authors also apply k-fold cross-validation to emphasize the evaluation of the model stability. However, we adopt a hybrid split taking the first 50,000 to apply the 5-fold cross-validation between the training and validation sets (known data), with 40,000 samples for training and 10,000 for validation. Thus, we can evaluate performance and stability by analyzing the mean and standard deviation of all trained models in the test set, composed of the last 9,834 samples, preserving the chronological order (simulating unknown data). We develop three distinct models to evaluate the impact of oversampling magnetogram sequences through the dataset. The first model, Solar Flare MViT (SF MViT), has trained only with the original data from our base dataset without using oversampling. In the second model, Solar Flare MViT over Train (SF MViT oT), we only apply oversampling on training data, maintaining the original validation dataset. In the third model, Solar Flare MViT over Train and Validation (SF MViT oTV), we apply oversampling in both training and validation sets. We also trained a model oversampling the entire dataset. We called it the "SF_MViT_oTV Test" to verify how resampling or adopting a test set with unreal data may bias the results positively. GitHub version The .zip hosted here contains all files from the project, including the checkpoint and the output files generated by the codes. We have a clean version hosted on GitHub (https://github.com/lfgrim/SFF_MagSeq_MViTs), without the magnetogram_jpg folder (which can be downloaded directly on https://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531804/dataset_ss2sff.zip) and the output and checkpoint files. Most code files hosted here also contain comments on the Portuguese language, which are being updated to English in the GitHub version. Folders Structure In the Root directory of the project, we have two folders:

    magnetogram_jpg: holds the source images provided by Space Environment Artificial Intelligence Early Warning Innovation Workshop through the link https://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531804/dataset_ss2sff.zip. It comprises 73,810 samples of high-quality magnetograms captured by HMI/SDO from 2010 May 4 to 2019 January 26. The HMI instrument provides these data (stored in hmi.sharp_720s dataset), making new samples available every 12 minutes. However, the images from this dataset were collected every 96 minutes. Each image has an associated magnetogram comprising a ready-made snippet of one or most solar ARs. It is essential to notice that the magnetograms cropped by SHARP can contain one or more solar ARs classified by the National Oceanic and Atmospheric Administration (NOAA). Seq_Magnetogram: contains the references for source images with the corresponding labels in the next 24 h. and 48 h. in the respectively M24 and M48 sub-folders.

    M24/M48: both present the following sub-folders structure:

    Seqs16; SF_MViT; SF_MViT_oT; SF_MViT_oTV; SF_MViT_oTV_Test. There are also two files in root:

    inst_packages.sh: install the packages and dependencies to run the models. download_MViTS.py: download the pre-trained MViTv2_S from PyTorch and store it in the cache. M24 and M48 folders hold reference text files (flare_Mclass...) linking the images in the magnetogram_jpg folders or the sequences (Seq16_flare_Mclass...) in the Seqs16 folders with their respective labels. They also hold "cria_seqs.py" which was responsible for creating the sequences and "test_pandas.py" to verify head info and check the number of samples categorized by the label of the text files. All the text files with the prefix "Seq16" and inside the Seqs16 folder were created by "criaseqs.py" code based on the correspondent "flare_Mclass" prefixed text files. Seqs16 folder holds reference text files, in which each file contains a sequence of images that was pointed to the magnetogram_jpg folders. All SF_MViT... folders hold the model training codes itself (SF_MViT...py) and the corresponding job submission (jobMViT...), temporary input (Seq16_flare...), output (saida_MVIT... and MViT_S...), error (err_MViT...) and checkpoint files (sample-FLARE...ckpt). Executed model training codes generate output, error, and checkpoint files. There is also a folder called "lightning_logs" that stores logs of trained models. Naming pattern for the files:

    magnetogram_jpg: follows the format "hmi.sharp_720s...magnetogram.fits.jpg" and Seqs16: follows the format "hmi.sharp_720s...to.", where:

    hmi: is the instrument that captured the image
    sharp_720s: is the database source of SDO/HMI.
    is the identification of SHARP region, and can contain one or more solar ARs classified by the (NOAA).
    is the date-time the instrument captured the image in the format yyyymmdd_hhnnss_TAI (y:year, m:month, d:day, h:hours, n:minutes, s:seconds).
    is the date-time when the sequence starts, and follow the same format of .

    is the date-time when the sequence ends, and follow the same format of . Reference text files in M24 and M48 or inside SF_MViT... folders follows the format "flare_Mclass_.txt", where:

    is Seq16 if refers to a sequence, or void if refers direct to images.

    "24h" or "48h".

    is "TrainVal" or "Test". The refers to the split of Train/Val.

    void or "_over" after the extension (...txt_over): means temporary input reference that was over-sampled by a training model. All SF_MViT...folders:

    Model training codes: "SF_MViT_M+_", where:

    void or "oT" (over Train) or "oTV" (over Train and Val) or "oTV_Test" (over Train, Val and Test);

    "24h" or "48h";

    "oneSplit" for a specific split or "allSplits" if run all splits.

    void is default to run 1 GPU or "2gpu" to run into 2 gpus systems; Job submission files: "jobMViT_", where:

    point the queue in Lovelace environment hosted on CENAPAD-SP (https://www.cenapad.unicamp.br/parque/jobsLovelace) Temporary inputs: "Seq16_flare_Mclass_.txt:

    train or val;

    void or "_over" after the extension (...txt_over): means temporary input reference that was over-sampled by a training model. Outputs: "saida_MViT_Adam_10-7", where:

    k0 to k4, means the correlated split of the output, or void if the output is from all splits. Error files: "err_MViT_Adam_10-7", where:

    k0 to k4, means the correlated split of the error log file, or void if the error file is from all splits. Checkpoint files: "sample-FLARE_MViT_S_10-7-epoch=-valid_loss=-Wloss_k=.ckpt", where:

    epoch number of the checkpoint;

    corresponding valid loss;

    0 to 4.

  9. [Dataset] Towards Robotic Mapping of a Honeybee Comb

    • data.europa.eu
    unknown
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo, [Dataset] Towards Robotic Mapping of a Honeybee Comb [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-15042164?locale=hu
    Explore at:
    unknown(4855)Available download formats
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    "Towards Robotic Mapping of a Honeybee Comb" Dataset This dataset supports the analyses and experiments of the paper: J. Janota et al., "Towards Robotic Mapping of a Honeybee Comb," 2024 International Conference on Manipulation, Automation and Robotics at Small Scales (MARSS), Delft, Netherlands, 2024, doi: 10.1109/MARSS61851.2024.10612712. Link to Paper | Link to Code Repository Cell Detection The celldet_2023 dataset contains a total of 260 images of the honeycomb (at resolution 67 µm per pixel), with masks from the ViT-H Segment Anything Model (SAM) and annotations for these masks. The structure of the dataset is following:celldet_2023├── {image_name}.png├── ...├── masksH (folder with masks for each image)├────{image_name}.json├────...├── annotations├────annotated_masksH (folder with annotations for training images)├──────{image_name in training part}.csv├──────...├────annotated_masksH_val (folder with annotations for validation images)├──────{image_name in validation part}.csv}├──────...├────annotated_masksH_test (folder with annotations for test images)├──────{image_name in test part}.csv}├──────... Masks For each image there is a .json file that contains all the masks produced by the SAM for the particular image, the masks are in COCO Run-Length Encoding (RLE) format. Annotations The annotation files are split into folders based on whether they were used for training, validation or testing. For each image (and thus also for each .json file with masks), there is a .csv file with two columns: Column id Description 0 order id of the mask in the corresponding .json file 1 mask label: 1 if fully visible cell, 2 if partially occluded cell, 0 otherwise Loading the Dataset For an example of loading the data, see the data loader in the paper repository: python cell_datasetV2.py --img_dir --mask_dir

  10. E

    ProfNER corpus: gold standard annotations for profession detection in...

    • live.european-language-grid.eu
    • data.niaid.nih.gov
    txt
    Updated Nov 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). ProfNER corpus: gold standard annotations for profession detection in Spanish COVID-19 tweets [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7872
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 29, 2022
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gold Standard annotations for SMM4H-Spanish shared task and unannotated test and background files. SMM4H 2021 accepted at NAACL (scheduled in Mexico City in June) https://2021.naacl.org/. Introduction: The entire corpus contains 10,000 annotated tweets. It has been split into training, validation and test (60-20-20). The current version contains the training and development set of the shared task with Gold Standard annotations.In addition, it contains the unannotated test and background sets will be released. Participants must submit predictions for the files under the directory "test-background-txt-files"For subtask-1 (classification), annotations are distributed in a tab-separated file (TSV). The TSV format follows the format employed in SMM4H 2019 Task 2:tweet_id class For subtask-2 (Named Entity Recognition, profession detection), annotations are distributed in 2 formats: Brat standoff and TSV. See the Brat webpage for more information about the Brat standoff format (https://brat.nlplab.org/standoff.html). The TSV format follows the format employed in SMM4H 2019 Task 2:tweet_id begin end type extraction. In addition, we provide a tokenized version of the dataset, for participant's convenience. It follows the BIO format (similar to CONLL). The files were generated with the brat_to_conll.py script (included), which employs the es_core_news_sm-2.3.1 Spacy model for tokenization.Zip structure:subtask-1: files of tweet classification subtask. Content: One TSV file per corpus split (train and valid).train-valid-txt-files: folder with training and validation text files. One text file per tweet. One sub-directory per corpus split (train and valid). train-valid-txt-files-english: folder with training and validation text files Machine Translated to English.test-background-txt-files: folder with the test and background text files. You must make your predictions for these files and upload them to CodaLab.subtask-2: files of Named Entity Recognition subtask. Content:brat: folder with annotations in Brat format. One sub-directory per corpus split (train and valid)TSV: folder with annotations in TSV. One file per corpus split (train and valid)BIO: folder with corpus in BIO tagging. One file per corpus split (train and valid)train-valid-txt-files: folder with training and validation text files. One text file per tweet. One sub-directory per corpus split (train and valid) train-valid-txt-files-english: folder with training and validation text files Machine Translated to English. test-background-txt-files: folder with the test and background text files. You must make your predictions for these files and upload them to CodaLab.Annotation quality: We have performed a consistency analysis of the corpus. 10% of the documents have been annotated by an internal annotator as well as by the linguist experts following the same annotation guidelines. The preliminary Inter-Annotator Agreement (pairwise agreement) is 0.919. Important shared task information: SYSTEM PREDICTIONS MUST FOLLOW THE TSV FORMAT. And systems will only be evaluated for the PROFESION and SITUACION_LABORAL predictions (despite the Gold Standard contains 2 extra entity classes). For more information about the evaluation scenario, see the Codalab link, or the evaluation webpage. For further information, please visit https://temu.bsc.es/smm4h-spanish/ or email us at encargo-pln-life@bsc.es Resources:WebAnnotation guidelines (in Spanish) Annotation guidelines (in English) FastText COVID-19 Twitter embeddings Occupations gazetteer

  11. h

    coupled-msd

    • huggingface.co
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    daniel frank (2025). coupled-msd [Dataset]. https://huggingface.co/datasets/dany-l-23/coupled-msd
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    daniel frank
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Overview This dataset contains input-output data of a coupled mass-spring-damper system with a nonlinear force profile. The data was generated with statesim [1], a python package for simulating linear and nonlinear ODEs, for the system coupled-msd. The configuration .json files for the corresponding datasets (in-distribution and out-of-distribution) can be found in the respective folders. After creating the dataset, the files are stored in the raw folder. Then, they are split into subsets for… See the full description on the dataset page: https://huggingface.co/datasets/dany-l-23/coupled-msd.

  12. Adaptive Parameter Control for Search-Based Unit Test Generation —...

    • zenodo.org
    zip
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henrik Johansson; Erik Blomberg; Henrik Johansson; Erik Blomberg (2025). Adaptive Parameter Control for Search-Based Unit Test Generation — Replication Package [Dataset]. http://doi.org/10.5281/zenodo.12187544
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Henrik Johansson; Erik Blomberg; Henrik Johansson; Erik Blomberg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Running the experiments

    Prerequisites for running the experiments:

    Steps to run the experiments:

    1. Download the experiment zip file you wish to run (single_parameter_experiment.zip or multi_parameter_experiment.zip).
    2. Un-zip the file.
    3. Open a terminal and navigate to the unzipped folder (e.g. cd single_parameter_experiment).
    4. Run poetry install --only main to install all dependencies.
    5. To run the experiment, run poetry run python run_experiment.py.
    6. All results can be found in the folder data/.

    The modules used for the experiment are defined in the file experiment_modules.py and to see the experiment configuration, look in run_experiment.py.

    Warning: the experiments take several weeks to run on a single machine, therefore it is advisable to split the experiments based on modules and run them in parallel.

    Running the analysis

    Prerequisites for running the analysis:

    Steps to run the analysis:

    1. Download the analysis zip file (analysis-adaptive-parameter-control.zip).
    2. Un-zip the file.
    3. Open a terminal and navigate to the unzipped folder (e.g. cd analysis-adaptive-parameter-control).
    4. Run the following command to install the conda environment and all dependencies: conda env create -f environment.yml
    5. If you want to re-run the Bayesian models locally on your machine, follow the optional step below, otherwise download and unzip the trace data from the replication package, i.e., Trace data single.zip and Trace data multi.zip.
    6. Place the .nc files in the corresponding folder: analysis-adaptive-parameter-control/single_parameter/ or analysis-adaptive-parameter-control/multi_parameter/.
      1. E.g. the coverage_rate_model_single_parameter.nc goes in the single_parameter folder, while the coverage_rate_model_multi_parameter.nc goes in the multi_parameter folder.
    7. Navigate to the notebooks folder (Notebooks/).
    8. Open a notebook of choice (coverage_rate_multi_parameter.ipynb, coverage_rate_single_parameter.ipynb, final_coverage_multi_parameter.ipynb, final_coverage_single_parameter.ipynb, overhead_model_multi_parameter.ipynb, or overhead_model_single_parameter.ipynb).
    9. Navigate to the section called "Data analysis" and run all cells in order.

    (Optional) Running the Bayesian models locally before the analysis.

    1. Navigate to the notebooks folder (Notebooks/).
    2. Open a notebook of choice (coverage_rate_multi_parameter.ipynb, coverage_rate_single_parameter.ipynb, final_coverage_multi_parameter.ipynb, final_coverage_single_parameter.ipynb, overhead_model_multi_parameter.ipynb, or overhead_model_single_parameter.ipynb).
    3. Navigate to the section called "Model specification" and run the three notebook cells.

    Warning: this will take a long time, if you don't have the time, use the following alternative instead

    Data

    The data from when we ran the experiments is available in the Single data.zip and Multi data.zip files.

    The structure of these are the following:

    • There are folders for each module the experiment was run on, further divided into each unique run. All these folders include:
      • Coverage reports.
      • Complete logs for the unique run.
      • A timeline over controlled parameter values during the test generation process.
      • The complete Pynguin configuration for the run.
      • The generated test suite.
    • There is one statistics.csv file containing some information about each run and their branch coverage timelines.

    Running the parameter assignment analysis

    Prerequisites for running the parameter assignment analysis:

    Steps to run the parameter assignment analysis:

    1. Download the analysis zip file (parameter-assignment.zip).
    2. Un-zip the file.
    3. Open a terminal and navigate to the unzipped folder (e.g. cd parameter-assignment).
    4. Run the following command to install the conda environment and all dependencies: conda env create -f environment.yml
    5. Navigate to the notebooks folder (Notebooks/).
    6. Open the notebook parameter_assignment_analysis.ipynb.
    7. Run all cells in order.
  13. Oxford Flowers-17 Restructured

    • kaggle.com
    zip
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aris Ilias Goutis (2025). Oxford Flowers-17 Restructured [Dataset]. https://www.kaggle.com/datasets/arisiliasgoutis/oxford-flowers-17-restructured
    Explore at:
    zip(60547590 bytes)Available download formats
    Dataset updated
    Jun 13, 2025
    Authors
    Aris Ilias Goutis
    Description

    Dataset is from https://www.robots.ox.ac.uk/~vgg/data/flowers/17/. It contains images of flowers from 17 different species. 80 images per class/species are contained in the dataset for a total of 1360 images. A test split of 25% has been applied, resulting to 20 image per class for test set & 60 image per class for train & validation set. Then, restructured to folder per class format, and folder for test vs training & validation split.

  14. I

    dataset_classification

    • app.ikomia.ai
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ikomia (2023). dataset_classification [Dataset]. https://app.ikomia.ai/hub/algorithms/dataset_classification/
    Explore at:
    Dataset updated
    Dec 20, 2023
    Dataset authored and provided by
    Ikomia
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Load classification dataset This algorithm allows to load a classification dataset from a given folder. It can also split the dataset into train and validation folders....

  15. h

    ESC50

    • huggingface.co
    Updated Sep 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maha Tufail Agro (2024). ESC50 [Dataset]. https://huggingface.co/datasets/MahiA/ESC50
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 28, 2024
    Authors
    Maha Tufail Agro
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ESC50

    This is an audio classification dataset for Environmental Sound Classification. Classes = 50 , Split = Train-Test

      Structure
    

    audios folder contains audio files. csv_files folder contains CSV files for five-fold cross-validation. To perform cross-validation on fold 1, train_1.csv will be used for the training split and test_1.csv for the testing split, with the same pattern followed for the other folds. To perform training and testing witout cross-validation… See the full description on the dataset page: https://huggingface.co/datasets/MahiA/ESC50.

  16. h

    AnyInstruct-resolution-1024

    • huggingface.co
    Updated Aug 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenMOSS (2024). AnyInstruct-resolution-1024 [Dataset]. https://huggingface.co/datasets/OpenMOSS-Team/AnyInstruct-resolution-1024
    Explore at:
    Dataset updated
    Aug 11, 2024
    Dataset authored and provided by
    OpenMOSS
    Description

    File Restoration and Extraction Guide

      File Structure
    

    Root directory: Contains Part 1 split files part2/ directory: Contains Part 2 split files

      Instructions
    
    
    
    
    
      Step 1: File Restoration
    

    Due to size limitations, the original file has been split. To restore the complete file: cat images_1024.part_* > images_1024.tar

      Step 2: Extraction
    

    To extract the contents: tar -xvf images_1024.tar

      Important Notes
    

    For Part 1 images: Execute… See the full description on the dataset page: https://huggingface.co/datasets/OpenMOSS-Team/AnyInstruct-resolution-1024.

  17. m

    Permeability Prediction in 2D: Dataset and Trained Convolutional Neural...

    • data.mendeley.com
    Updated Sep 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andre Adam (2025). Permeability Prediction in 2D: Dataset and Trained Convolutional Neural Networks [Dataset]. http://doi.org/10.17632/576dvrrsdx.2
    Explore at:
    Dataset updated
    Sep 23, 2025
    Authors
    Andre Adam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset was generated for the purpose of training convolutional neural network (CNN) models for permeability prediction of 2D structures. The whole dataset is part of a study on predicting permeability using CNNs, while addressing discussions that are largely absent from the current literature, such as the effect of data diversity in the accuracy, input pre-processing, error estimation, architecture comparisons, and sources of error. A link to the publication, which includes a lot more detail about the dataset and CNN models, will be added once it is published.

    The data included in this dataset is split into three different folders. The data under the "Training Data" folder includes 4,500 images, divided in 15 sub-folders. Each sub-folder contains 901 files, which are 300 images, a pressure-velocity map for each of the 300 structures, convergence data for each individual structure, and one comma-separated file (csv) summarizing all simulation results in the folder. The pressure and velocity maps together with the convergence information are direct results of the CFD algorithm used, but the important information for training the CNN models are the images and the permeability data in the csv files.

    The "Trained CNNs" folder contains all of the trained CNN models as described in the linked publication for predicting permeability. That includes the ensemble of VGG19 networks. The "External Test Set" includes the same type of data as the "Training Data" folder, but this section of data was only used to test the CNN models. In other words, the trained CNN models never saw any of this data in training, only in testing. The "External Test Set" folder also includes data for phase-size distributions and surface area for all data in this repository. For more details on those, refer to the publication.

    The CFD code and the image generation code can be found in the following GitHub, along with more extensive documentation: https://github.com/adama-wzr/PixelBasedPermeability/

  18. f

    S1 Videos zip segment -

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jan 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liebal, Katja; Burrows, Anne; Miyabe-Nishiwaki, Takako; Richardson, Jack L.; Waller, Bridget; Hayashi, Misato; Correia-Caeiro, Catia; Costa, Raquel; Robbins, Martha M.; Pater, Jordan (2025). S1 Videos zip segment - [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001498557
    Explore at:
    Dataset updated
    Jan 28, 2025
    Authors
    Liebal, Katja; Burrows, Anne; Miyabe-Nishiwaki, Takako; Richardson, Jack L.; Waller, Bridget; Hayashi, Misato; Correia-Caeiro, Catia; Costa, Raquel; Robbins, Martha M.; Pater, Jordan
    Description

    Each Videos ZIP Segment file is a segment of a split ZIP. To access the videos in the split ZIP, download all Videos ZIP Segment files that are part of the split ZIP to a single folder (including the Videos ZIP Target file that has the full “.zip” extension), rename all of the downloaded files to have the same root filename (e.g., rename “pone.0308790.s001.z01” to “Videos.z01”, rename “pone.0308790.s002.z02” to “Videos.z02”, rename “pone.0308790.s003.z03” to “Videos.z03”, etc.), then open the file with the “.zip” extension (e.g., “Videos.zip”), navigate the folders within the ZIP and select a video to open. Alternatively, after downloading all ZIP and ZXX files into a folder, unzip all files by opening the .ZIP file with a compressing software such as WinRAR, WinZip, or 7-Zip. This will give access to the 177 GorillaFACS video examples organized in their respective folders for each AU and AD. https://doi.org/10.1371/journal.pone.0308790.s001 (Z01)

  19. COCO annotated Dataset Car Damage Detection

    • kaggle.com
    zip
    Updated Nov 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramsi Kalia (2021). COCO annotated Dataset Car Damage Detection [Dataset]. https://www.kaggle.com/ramsikalia/coco-annotated-dataset-car-damage-detection
    Explore at:
    zip(134878631 bytes)Available download formats
    Dataset updated
    Nov 22, 2021
    Authors
    Ramsi Kalia
    Description

    Dataset

    This dataset was created by Ramsi Kalia

    Contents

  20. Caltech-256: Pre-Processed 80/20 Train-Test Split

    • kaggle.com
    zip
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KUSHAGRA MATHUR (2025). Caltech-256: Pre-Processed 80/20 Train-Test Split [Dataset]. https://www.kaggle.com/datasets/kushubhai/caltech-256-train-test
    Explore at:
    zip(1138799273 bytes)Available download formats
    Dataset updated
    Nov 12, 2025
    Authors
    KUSHAGRA MATHUR
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Context The Caltech-256 dataset is a foundational benchmark for object recognition, containing 30,607 images across 257 categories (256 object categories + 1 clutter category).

    The original dataset is typically provided as a collection of directories, one for each category. This version streamlines the machine learning workflow by providing:

    A clean, pre-defined 80/20 train-test split.

    Manifest files (train.csv, test.csv) that map image paths directly to their labels, allowing for easy use with data generators in frameworks like PyTorch and TensorFlow.

    A flat directory structure (train/, test/) for simplified file access.

    File Content The dataset is organized into a single top-level folder and two CSV files:

    train.csv: A CSV file containing two columns: image_path and label. This file lists all images designated for the training set.

    test.csv: A CSV file with the same structure as train.csv, listing all images designated for the testing set.

    Caltech-256_Train_Test/: The primary data folder.

    train/: This directory contains 80% of the images from all 257 categories, intended for model training.

    test/: This directory contains the remaining 20% of the images from all categories, reserved for model evaluation.

    Data Split The dataset has been thoroughly partitioned to create a standard 80% training and 20% testing split. This split is (or should be assumed to be) stratified, meaning that each of the 257 object categories is represented in roughly an 80/20 proportion in the respective sets.

    Acknowledgements & Original Source This dataset is a derivative work created for convenience. The original data and images belong to the authors of the Caltech-256 dataset.

    Original Dataset Link: https://www.kaggle.com/datasets/jessicali9530/caltech256/data

    Citation: Griffin, G. Holub, A.D. Perona, P. (2007). Caltech-256 Object Category Dataset. California Institute of Technology.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Hùng Nguyễn Việt (2024). Split-Dataset-12-cat-v2 [Dataset]. https://www.kaggle.com/datasets/hungapcs20/split-dataset-12-cat-v2/code
Organization logo

Split-Dataset-12-cat-v2

Explore at:
zip(788932534 bytes)Available download formats
Dataset updated
Mar 22, 2024
Authors
Hùng Nguyễn Việt
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

768MB dataset about garbage classification, divided into train, test, val folder

Search
Clear search
Close search
Google apps
Main menu