This software repository contains a python package Aegis (Active Evaluator Germane Interactive Selector) package that allows us to evaluate machine learning systems's performance (according to a metric such as accuracy) by adaptively sampling trials to label from an unlabeled test set to minimize the number of labels needed. This includes sample (public) data as well as a simulation script that tests different label-selecting strategies on already labelled test sets. This software is configured so that users can add their own data and system outputs to test evaluation.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains the 16 bit of the manually annotated ground truth labels for the nuclei that were used both in training (Labelled as "Original") or inference (Labelled as "Biological" or "Technical) for the MRCNN and FPN2-WS networks
This dataset was created by Aditi Pandey
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contrast-enhanced computed tomography scans (CECT) are routinely used in the evaluation of different clinical scenarios, including the detection and characterization of hepatocellular carcinoma (HCC). Quantitative medical image analysis has been an exponentially growing scientific field. A number of studies reported on the effects of variations in the contrast enhancement phase on the reproducibility of quantitative imaging features extracted from CT scans. The identification and labeling of phase enhancement is a time-consuming task, with a current need for an accurate automated labeling algorithm to identify the enhancement phase of CT scans. In this study, we investigated the ability of machine learning algorithms to label the phases in a dataset of 59 HCC patients scanned with a dynamic contrast-enhanced CT protocol. The ground truth labels were provided by expert radiologists. Regions of interest were defined within the aorta, the portal vein, and the liver. Mean density values were extracted from those regions of interest and used for machine learning modeling. Models were evaluated using accuracy, the area under the curve (AUC), and Matthew’s correlation coefficient (MCC). We tested the algorithms on an external dataset (76 patients). Our results indicate that several supervised learning algorithms (logistic regression, random forest, etc.) performed similarly, and our developed algorithms can accurately classify the phase of contrast enhancement.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This data is the ground truth for the "evaluation dataset" for the DCASE 2020 Challenge Task 2 "Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring" [task description].
In the task, three datasets have been released: "development dataset", "additional training dataset", and "evaluation dataset". The evaluation dataset was the last of the three released and includes around 400 samples for each Machine Type and Machine ID used in the evaluation dataset, none of which have any condition label (i.e., normal or anomaly). This ground truth data contains the condition labels.
Data format
The ground truth data is a CSV file like the following:
fan id_01_00000000.wav,normal_id_01_00000098.wav,0 id_01_00000001.wav,anomaly_id_01_00000064.wav,1 ...
id_05_00000456.wav,anomaly_id_05_00000033.wav,1 id_05_00000457.wav,normal_id_05_00000049.wav,0 pump id_01_00000000.wav,anomaly_id_01_00000049.wav,1 id_01_00000001.wav,anomaly_id_01_00000039.wav,1 ...
id_05_00000346.wav,anomaly_id_05_00000052.wav,1 id_05_00000347.wav,anomaly_id_05_00000080.wav,1 slider id_01_00000000.wav,anomaly_id_01_00000035.wav,1 id_01_00000001.wav,anomaly_id_01_00000176.wav,1 ...
"Fan", "pump", "slider", etc mean "Machine Type" names. The lines following a Machine Type correspond to pairs of a wave file in the Machine Type and a condition label. The first column shows the name of a wave file. The second column shows the original name of the wave file, but this can be ignored by users. The third column shows the condition label (i.e., 0: normal or 1: anomaly).
How to use
A system for calculating AUC and pAUC scores for the "evaluation dataset" is available on the Github repository [URL]. The ground truth data is used by this system. For more information, please see the Github repository.
Conditions of use
This dataset was created jointly by NTT Corporation and Hitachi, Ltd. and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.
Publication
If you use this dataset, please cite all the following three papers:
Yuma Koizumi, Shoichiro Saito, Noboru Harada, Hisashi Uematsu, and Keisuke Imoto, "ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection," in Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2019. [pdf]
Harsh Purohit, Ryo Tanabe, Kenji Ichige, Takashi Endo, Yuki Nikaido, Kaori Suefusa, and Yohei Kawaguchi, “MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection,” in Proc. 4th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2019. [pdf]
Yuma Koizumi, Yohei Kawaguchi, Keisuke Imoto, Toshiki Nakamura, Yuki Nikaido, Ryo Tanabe, Harsh Purohit, Kaori Suefusa, Takashi Endo, Masahiro Yasuda, and Noboru Harada, "Description and Discussion on DCASE2020 Challenge Task2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring," in Proc. 5th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2020. [pdf]
Feedback
If there is any problem, please contact us:
Yuma Koizumi, koizumi.yuma@ieee.org
Yohei Kawaguchi, yohei.kawaguchi.xk@hitachi.com
Keisuke Imoto, keisuke.imoto@ieee.org
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This folder contains four Image Annotation Datasets (ESPGame, IAPR-TC12, ImageCLEF 2011, ImagCLEF 2012). Each dataset has sub-folders of training images, testing images, ground truth, labels.
Moreover, labels are the limited number of labels the dataset could assign to an image. While the ground is the correct labeling for each image.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
The Inria Aerial Image Labeling addresses a core topic in remote sensing: the automatic pixelwise labeling of aerial imagery. Dataset features: Coverage of 810 km² (405 km² for training and 405 km² for testing) Aerial orthorectified color imagery with a spatial resolution of 0.3 m Ground truth data for two semantic classes: building and not building (publicly disclosed only for the training subset) The images cover dissimilar urban settlements, ranging from densely populated areas (e.g., San Francisco’s financial district) to alpine towns (e.g,. Lienz in Austrian Tyrol). Instead of splitting adjacent portions of the same images into the training and test subsets, different cities are included in each of the subsets. For example, images over Chicago are included in the training set (and not on the test set) and images over San Francisco are included on the test set (and not on the training set). The ultimate goal of this dataset is to assess the generalization power of the techniqu
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary
This submission is a supplementary material to the article [Coban 2020b]. As part of the manuscript, we release three simulated parallel-beam tomographic datasets of 94 apples with internal defects, the ground truth reconstructions and two defect label files.
Description
This Zenodo upload contains the ground truth reconstructed slices for each apple. In total, there are 72192 reconstructed slices, which have been divided into 6 separate submissions:
The simulated parallel-beam datasets and defect label files are also available through this project, via a separate Zenodo upload: 10.5281/zenodo.4212301.
Apparatus
The dataset is acquired using the custom-built and highly flexible CT scanner, FleX-ray Laboratory, developed by TESCAN-XRE, located at CWI in Amsterdam. This apparatus consists of a cone-beam microfocus X-ray point source that projects polychromatic X-rays onto a 1944-by-1536 pixels, 14-bit, flat detector panel. Full details can be found in [Coban 2020a].
Ground Truth Generation
We reconstructed the raw tomographic data, which was captured at sample resolution of 54.2µm over a 360 degrees in circular and continuous motion in a cone-beam setup. A total of 1200 projections were collected, which were distributed evenly over the full circle. The raw tomographic data is available upon request.
The ground truth reconstructed slices were generated based on Conjugate Gradient Least Squares (CGLS) reconstruction of each apple. The voxel grid in the reconstruction was 972px x 972px x 768px. The resolution in the ground truth reconstructions remained unchanged.
All ground truth reconstructed slices are in .tif format. Each file is named "appleNo_sliceNo.tif".
List of Contents
The contents of the submission is given below.
Additional Links
These datasets are produced by the Computational Imaging group at Centrum Wiskunde & Informatica (CI-CWI). For any relevant Python/MATLAB scripts for the FleX-ray datasets, we refer the reader to our group's GitHub page.
Contact Details
For more information or guidance in using these dataset, please get in touch with
Acknowledgments
We acknowledge GREEFA for supplying the apples and further discussions.
Description:
Dataset of photos downloaded from Flickr (241,582 photos) and Twitter-X (1,035,488 photos) labeled by different artificial intelligence models and validated by labels assigned by human experts.
The entire dataset was labeled using different AI models. First, we applied a Large Language Model (GPT-4.1 from OPENAI) and Llava 1.6 (on a subset of the data) to extract semantic labels from the image content based on prompts fine-tuned using prompt engineering.
In parallel, we used the base version of DINO (a self-supervised vision transformation model), fine-tuned with a subset of human expert-labeled images from our own dataset, to generate inferences for the entire image collection.
We also incorporated labels derived from expert vision models pre-trained on established datasets such as ImageNet, COCO, Places365, and Nature, which provided complementary classification information.
The labels used correspond to two categories (Table 1):
Table 1. Categories used in social media photo tagging: Stoten, based on the scientific framework proposed by Moreno-Llorca et al. (2020). Level 3, a hierarchical tagging system developed by our team to provide greater thematic detail, especially suited for the identification of Cultural Ecosystem Services.
Stoten
Level3
Cultural
Accommodation
Fauna/Flora
Air activities
Gastronomy
Animals
Nature & Landscape
Breakwater
Not relevant
Bridge
Recreational
Commerce facilities
Religious
Cities
Rural tourism
Clouds
Sports
Dam
Sun and beach
Dock
Urban
Fungus
Heritage and culture
Knowledge
Landscapes
Lighthouse
Not relevant
Other abiotic features
Plants
Roads
Shelter
Skies
Spiritual, symbolic and related connotations
Terrestrial activities
Towns and villages
Tracks and trails
Vegetation and habitats
Vehicle
Water activities
Wind farm
Winter activities
Table 2. Table of contents of the dataset
Folder
format
Description
AI models
DINO
model
.pt and pth
Model fine-tuned with a subset of expert-labeled images
Expert models
CES_label_tree
.csv
Equivalence table used to assign labels generated by expert models to our categories of interest (Stoten and Level3)
LLMs GPT and Llava prompts
GPT_Label_local_files
.py
Python script used for labeling photos using OPENAI models (in our case we used the GPT 4.1 model)
Level3_GPT_LLava_7_prompts_used
.txt
Seven prompts used for photo tagging using GPT 4.1 and Llava 1.6
Stoten_GPT_LLava_7_prompts_used
.txt
Seven prompts used for photo tagging with Stoten using GPT 4.1 and Llava 1.6
Stoten_Level3_categories
.csv
Seven prompts used for photo tagging with level 3 using GPT 4.1 and Llava 1.6
Flickr
AI based labels
DINO
Flickr_DINO_all
.csv
Inferences for all Flickr photos from the DINO model trained with the ground truth
Expert models
Flickr_expert_models_all
.csv
Labels generated by expert models for the entire database
GPT
Flickr_GPT_all
.csv
Database of Flickr photos tagged with CES using OPENAI's GPT-4.1 model.
Flickr_GPT_7_prompts_8192
.csv
Subset of the Flickr photo database with CES-related tags assigned by the GPT 4.1 model where 7 prompts are tested for Stoten and Level 3.
Llava 1.6
Flickr_Llava_1-6
.csv
Subset of the Flickr photo database with CES-related tags assigned by the Llava 1.6 model where 7 prompts are tested for Stoten and Level 3.
Ground truth
Ground Truth labels
Flickr_Database_Labeled_1082
.csv
Contain labels assigned by human experts and after rounds of review and consensus, for both Stoten and Level 3, from 1082 Flickr photos
Flickr_Database_Labeled_7110
.csv
Ground Truth, an archive containing labels assigned by human experts and after rounds of review and consensus, for both Stoten and Level 3, from 7110 Flickr photos
Flickr_Database_Labeled_8192
.csv
Union of the two databases labeled above
Ground Truth photos
1082
.jpg/png
Photos labeled by human experts, these photos were selected to be representative of different parks, with different levels of protection and representative of different CES
7110
.jpg/png
Photos labeled by human experts, these photos were selected to be representative of different parks, with different levels of protection and representative of different CES
Human labels
Flickr_DataBase_Labeled_1082_expert1_AS
.csv
File containing tags assigned by expert 1 for both Stoten and Level 3, from 1082 Flickr photos
Flickr_DataBase_Labeled_1082_expert2_FG
.csv
File containing tags assigned by expert 2 for both Stoten and Level 3, from 1082 Flickr photos
Flickr_DataBase_Labeled_7110_expert1_CN
.csv
File containing tags assigned by expert 1 for both Stoten and Level 3, from 7110 Flickr photos
AI based labels
DINO
Twitter_DINO_all
.csv
Inferences for all Twitter photos from the DINO model trained with the ground truth
Expert models
Twitter_expert_models_all
.csv
Labels generated by expert models for the entire database
GPT
Twitter_GPT_all
.csv
Database of Twitter photos tagged with CES using OPENAI's GPT-4.1 model.
Twitter_GPT_7_prompts_150
.csv
Subset of the Twitter photo database with CES-related tags assigned by the GPT 4.1 model where 7 prompts are tested for Stoten and Level 3.
Llava 1.6
Twitter_Llava_1-6
.csv
Subset of the Twitter photo database with CES-related tags assigned by the Llava 1.6 model where 7 prompts are tested for Stoten and Level 3.
Ground truth
Ground Truth labels
Twitter_Database_Labeled_150
.csv
Contain labels assigned by human experts and after rounds of review and consensus, for both Stoten and Level 3, from 150 Twitter photos
Twitter_Database_Labeled_6804
.csv
Contain labels assigned by human experts and after rounds of review and consensus, for both Stoten and Level 3, from 6804 Twitter photos
Ground Truth photos
150
.jpg/png
Photos labeled by human experts, these photos were selected to be representative of different parks, with different levels of protection and representative of different CES
6804
.jpg/png
Photos labeled by human experts, these photos were selected to be representative of different parks, with different levels of protection and representative of different CES
Human labels
Flickr_DataBase_Labeled_150_7experts
.csv
File containing tags assigned by 7 experts for both Stoten and Level 3, from 150 Twitter photos
Flickr_DataBase_Labeled_6804_expert1_FG
.csv
File containing tags assigned by expert 2 for both Stoten and Level 3, from 6804 Twitter photos
References:
Moreno-Llorca, R., Méndez, P. F., Ros-Candeira, A., Alcaraz-Segura, D., Santamaría, L., Ramos-Ridao, Á. F., ... & Vaz, A. S. (2020). Evaluating tourist profiles and nature-based experiences in Biosphere Reserves using Flickr: Matches and mismatches between online social surveys and photo content analysis. Science of the Total Environment, 737, 140067. https://doi.org/10.1016/j.scitotenv.2020.140067
http://www.gnu.org/licenses/gpl-3.0.en.htmlhttp://www.gnu.org/licenses/gpl-3.0.en.html
This file contains the MATLAB source code for developing Ground Truth Dataset, Semantic Segmentation, and Evaluation for Lumbar Spine MRI Dataset. It has the file structure necessary for the execution of the code. Please download the MRI Dataset and the Ground Truth label Image dataset separately and unzip them inside the LJMU Lumbar Spine MRI Dataset and Software\99 Workspace\ folder.
The open dataset, software, and other files accompanying the manuscript "An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models," submitted for publication to Integrated Materials and Manufacturing Innovations.Machine learning and autonomy are increasingly prevalent in materials science, but existing models are often trained or tuned using idealized data as absolute ground truths. In actual materials science, "ground truth" is often a matter of interpretation and is more readily determined by consensus. Here we present the data, software, and other files for a study using as-obtained diffraction data as a test case for evaluating the performance of machine learning models in the presence of differing expert opinions. We demonstrate that experts with similar backgrounds can disagree greatly even for something as intuitive as using diffraction to identify the start and end of a phase transformation. We then use a logarithmic likelihood method to evaluate the performance of machine learning models in relation to the consensus expert labels and their variance. We further illustrate this method's efficacy in ranking a number of state-of-the-art phase mapping algorithms. We propose a materials data challenge centered around the problem of evaluating models based on consensus with uncertainty. The data, labels, and code used in this study are all available online at data.gov, and the interested reader is encouraged to replicate and improve the existing models or to propose alternative methods for evaluating algorithmic performance.
Dataset Card for predicted_labels
These photos are used in the FiftyOne getting started webinar. The images have a prediction label where were generated by self-supervised classification through a OpenClip Model. https://github.com/thesteve0/fiftyone-getting-started/blob/main/5_generating_labels.py They were then manually cleaned to produce the ground truth label. https://github.com/thesteve0/fiftyone-getting-started/blob/main/6_clean_labels.md They are 300 public domain photos… See the full description on the dataset page: https://huggingface.co/datasets/Voxel51/getting-started-labeled-photos.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This data is the ground truth for the "evaluation dataset" for the DCASE 2021 Challenge Task 2 "Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions".
In the task, three datasets have been released: "development dataset", "additional training dataset", and "evaluation dataset". The evaluation dataset was the last of the three released and includes around 200 samples for each machine type, section index, and domain, none of which have a condition label (i.e., normal or anomaly). This ground truth dataset contains the condition labels.
Data format
The CSV file for each machine type, section index, and domain includes the ground truth data like the following:
section_03_source_test_0000.wav,1 section_03_source_test_0001.wav,1
...
section_03_source_test_0198.wav,0 section_03_source_test_0199.wav,1
The first column shows the name of a wave file. The second column shows the condition label (i.e., 0: normal or 1: anomaly).
How to use
A script for calculating the AUC, pAUC, precision, recall, and F1 scores for the "evaluation dataset" is available on the Github repository [URL]. The ground truth data are used by this system. For more information, please see the Github repository.
Conditions of use
This dataset was created jointly by Hitachi, Ltd. and NTT Corporation and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.
Publication
If you use this dataset, please cite all the following three papers:
Yohei Kawaguchi, Keisuke Imoto, Yuma Koizumi, Noboru Harada, Daisuke Niizumi, Kota Dohi, Ryo Tanabe, Harsh Purohit, and Takashi Endo, "Description and Discussion on DCASE 2021 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions," in arXiv e-prints: 2106.04492, 2021. [URL]
Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, Shoichiro Saito, "ToyADMOS2: Another Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection under Domain Shift Conditions," in arXiv e-prints: 2106.02369, 2021. [URL]
Ryo Tanabe, Harsh Purohit, Kota Dohi, Takashi Endo, Yuki Nikaido, Toshiki Nakamura, and Yohei Kawaguchi, "MIMII DUE: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection with Domain Shifts due to Changes in Operational and Environmental Conditions," in arXiv e-prints: 2105.02702, 2021. [URL]
Feedback
If there is any problem, please contact us:
Yohei Kawaguchi, yohei.kawaguchi.xk@hitachi.com
Daisuke Niizumi, daisuke.niizumi.dt@hco.ntt.co.jp
Keisuke Imoto, keisuke.imoto@ieee.org
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The 95% confidence intervals of accuracy and MCC of the supervised learning models for the main dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset constructed and used in the paper "1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis".
We release it with its ground truth labels to facilitate further research in binary similarity analysis under function inlining. Besides, we also welcome others to find miss-labeling in this dataset and fix them. We hope our dataset could help researchers conduct their studies and improve the binary similarity analysis techniques.
Binaries of dataset-I can also be accessed from https://github.com/SoftSec-KAIST/BinKit. It is named the "normal dataset".
Any miss-labeling can be reported at https://github.com/island255/TOSEM2022.
Using this dataset, please cite our paper, "1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis".
FSDKaggle2018 is an audio dataset containing 11,073 audio files annotated with 41 labels of the AudioSet Ontology. FSDKaggle2018 has been used for the DCASE Challenge 2018 Task 2, which was run as a Kaggle competition titled Freesound General-Purpose Audio Tagging Challenge. Citation If you use the FSDKaggle2018 dataset or part of it, please cite our DCASE 2018 paper: Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra. "General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline". Proceedings of the DCASE 2018 Workshop (2018) You can also consider citing our ISMIR 2017 paper, which describes how we gathered the manual annotations included in FSDKaggle2018. Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, "Freesound Datasets: A Platform for the Creation of Open Audio Datasets", In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017 Contact You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu. About this dataset Freesound Dataset Kaggle 2018 (or FSDKaggle2018 for short) is an audio dataset containing 11,073 audio files annotated with 41 labels of the AudioSet Ontology [1]. FSDKaggle2018 has been used for the Task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2018. Please visit the DCASE2018 Challenge Task 2 website for more information. This Task was hosted on the Kaggle platform as a competition titled Freesound General-Purpose Audio Tagging Challenge. It was organized by researchers from the Music Technology Group of Universitat Pompeu Fabra, and from Google Research’s Machine Perception Team. The goal of this competition was to build an audio tagging system that can categorize an audio clip as belonging to one of a set of 41 diverse categories drawn from the AudioSet Ontology. All audio samples in this dataset are gathered from Freesound [2] and are provided here as uncompressed PCM 16 bit, 44.1 kHz, mono audio files. Note that because Freesound content is collaboratively contributed, recording quality and techniques can vary widely. The ground truth data provided in this dataset has been obtained after a data labeling process which is described below in the Data labeling process section. FSDKaggle2018 clips are unequally distributed in the following 41 categories of the AudioSet Ontology: "Acoustic_guitar", "Applause", "Bark", "Bass_drum", "Burping_or_eructation", "Bus", "Cello", "Chime", "Clarinet", "Computer_keyboard", "Cough", "Cowbell", "Double_bass", "Drawer_open_or_close", "Electric_piano", "Fart", "Finger_snapping", "Fireworks", "Flute", "Glockenspiel", "Gong", "Gunshot_or_gunfire", "Harmonica", "Hi-hat", "Keys_jangling", "Knock", "Laughter", "Meow", "Microwave_oven", "Oboe", "Saxophone", "Scissors", "Shatter", "Snare_drum", "Squeak", "Tambourine", "Tearing", "Telephone", "Trumpet", "Violin_or_fiddle", "Writing". Some other relevant characteristics of FSDKaggle2018: The dataset is split into a train set and a test set. The train set is meant to be for system development and includes ~9.5k samples unequally distributed among 41 categories. The minimum number of audio samples per category in the train set is 94, and the maximum 300. The duration of the audio samples ranges from 300ms to 30s due to the diversity of the sound categories and the preferences of Freesound users when recording sounds. The total duration of the train set is roughly 18h. Out of the ~9.5k samples from the train set, ~3.7k have manually-verified ground truth annotations and ~5.8k have non-verified annotations. The non-verified annotations of the train set have a quality estimate of at least 65-70% in each category. Checkout the Data labeling process section below for more information about this aspect. Non-verified annotations in the train set are properly flagged in train.csv so that participants can opt to use this information during the development of their systems. The test set is composed of 1.6k samples with manually-verified annotations and with a similar category distribution than that of the train set. The total duration of the test set is roughly 2h. All audio samples in this dataset have a single label (i.e. are only annotated with one label). Checkout the Data labeling process section below for more information about this aspect. A single label should be predicted for each file in the test set. Data labeling process The data labeling process started from a manual mapping between Freesound tags and AudioSet Ontology categories (or labels), which was carried out by researchers at the Music Technology Group, Universitat Pompeu Fabra, Barcelona. Using this mapping, a number of Freesound audio samples were automatically annotated with labels from the AudioSet Onto...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data contains the corresponding labelled images of Capsicum Annuum that are included in the "Unlabelled Weed Detection Images for Hot Peppers" data set on this site. This data set contains the labels 0,1 and 2 which can be displayed by assigning a unique pixel value (Eg. Recommended: 0,60,255) to each occurrence of the label. These images can be utilised as ground truth labels for machine learning and data exploration. These labels represent three categories, namely, weed, crop and background. The labels were assigned by a team of trained individuals from Trinidad and Tobago using the Image Labeller App in the Computer Vision library from Matlab.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary
This submission is a supplementary material to the article [Coban 2020b]. As part of the manuscript, we release three simulated parallel-beam tomographic datasets of 94 apples with internal defects, the ground truth reconstructions and two defect label files.
Description
This Zenodo upload contains the three simulated datasets, Datasets A-C, and the two defect label files. The three versions are a noiseless simulation (Dataset A); simulation with added Gaussian noise (Dataset B), and with scattering noise (Dataset C). The datasets are based on real 3D X-ray CT data and their subsequent volume reconstructions.
The defect label files contain tables of pixel numbers assigned to each defect present in the apples. Each row in the table corresponds to a single apple, and each column is a defect across the 94 apples. The two defect label files are apple_defects_full.csv and apple_defects_partial.csv: The former containing defect information for a full apple, and the latter for a selection of slices in an apple.
The ground truth reconstructions are also available through this project, via 6 separate Zenodo uploads:
The datasets are prepared for development and testing of data-driven, learning-based image reconstruction, segmentation and post-processing methods.
Simulation Setup
Each projection in Datasets A-C contains 50 angles, taken over 180 degrees. The projections are evenly distributed over the half circle, with an angular increment of 3.6 degrees. The size of the projections is 50px x 1377px. In total we have 62792 projections for Datasets A and B, and 7520 for Dataset C. All projections are in .tif format.
We also include proj_angs.txt containing a list of projection angles in radians. These projection angles are the same for all three datasets.
List of Contents
The contents of the submission is given below.
Additional Links
These datasets are produced by the Computational Imaging group at Centrum Wiskunde & Informatica (CI-CWI). For any relevant Python/MATLAB scripts for the FleX-ray datasets, we refer the reader to our group's GitHub page.
Contact Details
For more information or guidance in using these dataset, please get in touch with
Acknowledgments
We acknowledge GREEFA for supplying the apples and further discussions.
FSD50K is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra.
Citation
If you use the FSD50K dataset, or part of it, please cite our TASLP paper (available from [arXiv] [TASLP]):
@article{fonseca2022FSD50K, title={{FSD50K}: an open dataset of human-labeled sound events}, author={Fonseca, Eduardo and Favory, Xavier and Pons, Jordi and Font, Frederic and Serra, Xavier}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, volume={30}, pages={829--852}, year={2022}, publisher={IEEE} }
Paper update: This paper has been published in TASLP at the beginning of 2022. The accepted camera-ready version includes a number of improvements with respect to the initial submission. The main updates include: estimation of the amount of label noise in FSD50K, SNR comparison between FSD50K and AudioSet, improved description of evaluation metrics including equations, clarification of experimental methodology and some results, some content moved to Appendix for readability. The TASLP-accepted camera-ready version is available from arXiv (in particular, it is v2 in arXiv, displayed by default).
Data curators
Eduardo Fonseca, Xavier Favory, Jordi Pons, Mercedes Collado, Ceren Can, Rachit Gupta, Javier Arredondo, Gary Avendano and Sara Fernandez
Contact
You are welcome to contact Eduardo Fonseca should you have any questions, at efonseca@google.com.
ABOUT FSD50K
Freesound Dataset 50k (or FSD50K for short) is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology [1]. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra.
What follows is a brief summary of FSD50K's most important characteristics. Please have a look at our paper (especially Section 4) to extend the basic information provided here with relevant details for its usage, as well as discussion, limitations, applications and more.
Basic characteristics:
FSD50K contains 51,197 audio clips from Freesound, totalling 108.3 hours of multi-labeled audio
The dataset encompasses 200 sound classes (144 leaf nodes and 56 intermediate nodes) hierarchically organized with a subset of the AudioSet Ontology.
The audio content is composed mainly of sound events produced by physical sound sources and production mechanisms, including human sounds, sounds of things, animals, natural sounds, musical instruments and more. The vocabulary can be inspected in vocabulary.csv (see Files section below).
The acoustic material has been manually labeled by humans following a data labeling process using the Freesound Annotator platform [2].
Clips are of variable length from 0.3 to 30s, due to the diversity of the sound classes and the preferences of Freesound users when recording sounds.
All clips are provided as uncompressed PCM 16 bit 44.1 kHz mono audio files.
Ground truth labels are provided at the clip-level (i.e., weak labels).
The dataset poses mainly a large-vocabulary multi-label sound event classification problem, but also allows development and evaluation of a variety of machine listening approaches (see Sec. 4D in our paper).
In addition to audio clips and ground truth, additional metadata is made available (including raw annotations, sound predominance ratings, Freesound metadata, and more), allowing a variety of analyses and sound event research tasks (see Files section below).
The audio clips are grouped into a development (dev) set and an evaluation (eval) set such that they do not have clips from the same Freesound uploader.
Dev set:
40,966 audio clips totalling 80.4 hours of audio
Avg duration/clip: 7.1s
114,271 smeared labels (i.e., labels propagated in the upwards direction to the root of the ontology)
Labels are correct but could be occasionally incomplete
A train/validation split is provided (Sec. 3H). If a different split is used, it should be specified for reproducibility and fair comparability of results (see Sec. 5C of our paper)
Eval set:
10,231 audio clips totalling 27.9 hours of audio
Avg duration/clip: 9.8s
38,596 smeared labels
Eval set is labeled exhaustively (labels are correct and complete for the considered vocabulary)
Note: All classes in FSD50K are represented in AudioSet, except Crash cymbal, Human group actions, Human voice, Respiratory sounds, and Domestic sounds, home sounds.
LICENSE
All audio clips in FSD50K are released under Creative Commons (CC) licenses. Each clip has its own license as defined by the clip uploader in Freesound, some of them requiring attribution to their original authors and some forbidding further commercial reuse. Specifically:
The development set consists of 40,966 clips with the following licenses:
CC0: 14,959
CC-BY: 20,017
CC-BY-NC: 4616
CC Sampling+: 1374
The evaluation set consists of 10,231 clips with the following licenses:
CC0: 4914
CC-BY: 3489
CC-BY-NC: 1425
CC Sampling+: 403
For attribution purposes and to facilitate attribution of these files to third parties, we include a mapping from the audio clips to their corresponding licenses. The licenses are specified in the files dev_clips_info_FSD50K.json and eval_clips_info_FSD50K.json.
In addition, FSD50K as a whole is the result of a curation process and it has an additional license: FSD50K is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the FSD50K.doc zip file. We note that the choice of one license for the dataset as a whole is not straightforward as it comprises items with different licenses (such as audio clips, annotations, or data split). The choice of a global license in these cases may warrant further investigation (e.g., by someone with a background in copyright law).
Usage of FSD50K for commercial purposes:
If you'd like to use FSD50K for commercial purposes, please contact Eduardo Fonseca and Frederic Font at efonseca@google.com and frederic.font@upf.edu.
Also, if you are interested in using FSD50K for machine learning competitions, please contact Eduardo Fonseca and Frederic Font at efonseca@google.com and frederic.font@upf.edu.
FILES
FSD50K can be downloaded as a series of zip files with the following directory structure:
root
│
└───FSD50K.dev_audio/ Audio clips in the dev set
│
└───FSD50K.eval_audio/ Audio clips in the eval set
│
└───FSD50K.ground_truth/ Files for FSD50K's ground truth
│ │
│ └─── dev.csv Ground truth for the dev set
│ │
│ └─── eval.csv Ground truth for the eval set
│ │
│ └─── vocabulary.csv List of 200 sound classes in FSD50K
│
└───FSD50K.metadata/ Files for additional metadata
│ │
│ └─── class_info_FSD50K.json Metadata about the sound classes
│ │
│ └─── dev_clips_info_FSD50K.json Metadata about the dev clips
│ │
│ └─── eval_clips_info_FSD50K.json Metadata about the eval clips
│ │
│ └─── pp_pnp_ratings_FSD50K.json PP/PNP ratings
│ │
│ └─── collection/ Files for the sound collection format
│
└───FSD50K.doc/
│
└───README.md The dataset description file that you are reading
│
└───LICENSE-DATASET License of the FSD50K dataset as an entity
Each row (i.e. audio clip) of dev.csv contains the following information:
fname: the file name without the .wav extension, e.g., the fname 64760 corresponds to the file 64760.wav in disk. This number is the Freesound id. We always use Freesound ids as filenames.
labels: the class labels (i.e., the ground truth). Note these class labels are smeared, i.e., the labels have been propagated in the upwards direction to the root of the ontology. More details about the label smearing process can be found in Appendix D of our paper.
mids: the Freebase identifiers corresponding to the class labels, as defined in the AudioSet Ontology specification
split: whether the clip belongs to train or val (see paper for details on the proposed split)
Rows in eval.csv follow the same format, except that there is no split column.
Note: We use a slightly different format than AudioSet for the naming of class labels in order to avoid potential problems with spaces, commas, etc. Example: we use Accelerating_and_revving_and_vroom instead of the original Accelerating, revving, vroom. You can go back to the original AudioSet naming using the information provided in vocabulary.csv (class label and mid for the 200 classes of FSD50K) and the AudioSet Ontology specification.
Files with additional metadata (FSD50K.metadata/)
To allow a variety of analysis and approaches with FSD50K, we provide the following metadata:
class_info_FSD50K.json: python dictionary where each entry corresponds to one sound class and contains: FAQs utilized during the annotation of the class, examples (representative audio clips), and verification_examples (audio clips presented to raters during annotation as a quality control mechanism). Audio clips are described by the Freesound id. Note: It may be that some of these examples are not included in the FSD50K release.
dev_clips_info_FSD50K.json: python dictionary where each entry corresponds to one dev clip and contains: title,
This software repository contains a python package Aegis (Active Evaluator Germane Interactive Selector) package that allows us to evaluate machine learning systems's performance (according to a metric such as accuracy) by adaptively sampling trials to label from an unlabeled test set to minimize the number of labels needed. This includes sample (public) data as well as a simulation script that tests different label-selecting strategies on already labelled test sets. This software is configured so that users can add their own data and system outputs to test evaluation.