4 datasets found

f
Action recognition performance.
plos.figshare.com
xls
Updated Apr 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Joannou; Pia Rotshtein; Uta Noppeney (2024). Action recognition performance. [Dataset]. http://doi.org/10.1371/journal.pone.0301098.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0301098.t006
Dataset updated
Apr 1, 2024
Dataset provided by
PLOS ONE
Authors
Michael Joannou; Pia Rotshtein; Uta Noppeney
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present Audiovisual Moments in Time (AVMIT), a large-scale dataset of audiovisual action events. In an extensive annotation task 11 participants labelled a subset of 3-second audiovisual videos from the Moments in Time dataset (MIT). For each trial, participants assessed whether the labelled audiovisual action event was present and whether it was the most prominent feature of the video. The dataset includes the annotation of 57,177 audiovisual videos, each independently evaluated by 3 of 11 trained participants. From this initial collection, we created a curated test set of 16 distinct action classes, with 60 videos each (960 videos). We also offer 2 sets of pre-computed audiovisual feature embeddings, using VGGish/YamNet for audio data and VGG16/EfficientNetB0 for visual data, thereby lowering the barrier to entry for audiovisual DNN research. We explored the advantages of AVMIT annotations and feature embeddings to improve performance on audiovisual event recognition. A series of 6 Recurrent Neural Networks (RNNs) were trained on either AVMIT-filtered audiovisual events or modality-agnostic events from MIT, and then tested on our audiovisual test set. In all RNNs, top 1 accuracy was increased by 2.71-5.94% by training exclusively on audiovisual events, even outweighing a three-fold increase in training data. Additionally, we introduce the Supervised Audiovisual Correspondence (SAVC) task whereby a classifier must discern whether audio and visual streams correspond to the same action label. We trained 6 RNNs on the SAVC task, with or without AVMIT-filtering, to explore whether AVMIT is helpful for cross-modal learning. In all RNNs, accuracy improved by 2.09-19.16% with AVMIT-filtered data. We anticipate that the newly annotated AVMIT dataset will serve as a valuable resource for research and comparative experiments involving computational models and human participants, specifically when addressing research questions where audiovisual correspondence is of critical importance.
f
Description of data in test_set.csv.
plos.figshare.com
Updated Apr 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
Unique identifier
https://doi.org/10.1371/journal.pone.0301098.t004
Dataset updated
Apr 1, 2024
Dataset provided by
PLOS ONE
Authors
Michael Joannou; Pia Rotshtein; Uta Noppeney
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present Audiovisual Moments in Time (AVMIT), a large-scale dataset of audiovisual action events. In an extensive annotation task 11 participants labelled a subset of 3-second audiovisual videos from the Moments in Time dataset (MIT). For each trial, participants assessed whether the labelled audiovisual action event was present and whether it was the most prominent feature of the video. The dataset includes the annotation of 57,177 audiovisual videos, each independently evaluated by 3 of 11 trained participants. From this initial collection, we created a curated test set of 16 distinct action classes, with 60 videos each (960 videos). We also offer 2 sets of pre-computed audiovisual feature embeddings, using VGGish/YamNet for audio data and VGG16/EfficientNetB0 for visual data, thereby lowering the barrier to entry for audiovisual DNN research. We explored the advantages of AVMIT annotations and feature embeddings to improve performance on audiovisual event recognition. A series of 6 Recurrent Neural Networks (RNNs) were trained on either AVMIT-filtered audiovisual events or modality-agnostic events from MIT, and then tested on our audiovisual test set. In all RNNs, top 1 accuracy was increased by 2.71-5.94% by training exclusively on audiovisual events, even outweighing a three-fold increase in training data. Additionally, we introduce the Supervised Audiovisual Correspondence (SAVC) task whereby a classifier must discern whether audio and visual streams correspond to the same action label. We trained 6 RNNs on the SAVC task, with or without AVMIT-filtering, to explore whether AVMIT is helpful for cross-modal learning. In all RNNs, accuracy improved by 2.09-19.16% with AVMIT-filtered data. We anticipate that the newly annotated AVMIT dataset will serve as a valuable resource for research and comparative experiments involving computational models and human participants, specifically when addressing research questions where audiovisual correspondence is of critical importance.
R
Iot Project Dataset
universe.roboflow.com
zip
Updated Nov 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HandGesturerecog (2023). Iot Project Dataset [Dataset]. https://universe.roboflow.com/handgesturerecog/iot-project-rpv5k/model/1
Explore at:
zipAvailable download formats
Dataset updated
Nov 17, 2023
Dataset authored and provided by
HandGesturerecog
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Hand Moment Bounding Boxes
Description
IoT Project

## Overview IoT Project is a dataset for object detection tasks - it contains Hand Moment annotations for 320 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
f
Statistics of popular audiovisual action datasets.
plos.figshare.com
xls
Updated Apr 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0301098.t002
Dataset updated
Apr 1, 2024
Dataset provided by
PLOS ONE
Authors
Michael Joannou; Pia Rotshtein; Uta Noppeney
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistics of popular audiovisual action datasets.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Michael Joannou; Pia Rotshtein; Uta Noppeney (2024). Action recognition performance. [Dataset]. http://doi.org/10.1371/journal.pone.0301098.t006

Action recognition performance.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0301098.t006

Dataset updated

Apr 1, 2024

Dataset provided by

PLOS ONE

Authors

Michael Joannou; Pia Rotshtein; Uta Noppeney

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We present Audiovisual Moments in Time (AVMIT), a large-scale dataset of audiovisual action events. In an extensive annotation task 11 participants labelled a subset of 3-second audiovisual videos from the Moments in Time dataset (MIT). For each trial, participants assessed whether the labelled audiovisual action event was present and whether it was the most prominent feature of the video. The dataset includes the annotation of 57,177 audiovisual videos, each independently evaluated by 3 of 11 trained participants. From this initial collection, we created a curated test set of 16 distinct action classes, with 60 videos each (960 videos). We also offer 2 sets of pre-computed audiovisual feature embeddings, using VGGish/YamNet for audio data and VGG16/EfficientNetB0 for visual data, thereby lowering the barrier to entry for audiovisual DNN research. We explored the advantages of AVMIT annotations and feature embeddings to improve performance on audiovisual event recognition. A series of 6 Recurrent Neural Networks (RNNs) were trained on either AVMIT-filtered audiovisual events or modality-agnostic events from MIT, and then tested on our audiovisual test set. In all RNNs, top 1 accuracy was increased by 2.71-5.94% by training exclusively on audiovisual events, even outweighing a three-fold increase in training data. Additionally, we introduce the Supervised Audiovisual Correspondence (SAVC) task whereby a classifier must discern whether audio and visual streams correspond to the same action label. We trained 6 RNNs on the SAVC task, with or without AVMIT-filtering, to explore whether AVMIT is helpful for cross-modal learning. In all RNNs, accuracy improved by 2.09-19.16% with AVMIT-filtered data. We anticipate that the newly annotated AVMIT dataset will serve as a valuable resource for research and comparative experiments involving computational models and human participants, specifically when addressing research questions where audiovisual correspondence is of critical importance.

Clear search

Close search

Google apps

Main menu

Action recognition performance.

Description of data in test_set.csv.

Iot Project Dataset

IoT Project

Statistics of popular audiovisual action datasets.

Action recognition performance.See More Versions

Action recognition performance.