71 datasets found

Data from: Tag Recommendation Datasets
figshare.com
txt
Updated Jan 25, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fabiano Belem (2016). Tag Recommendation Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.2067183.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.2067183.v4
Dataset updated
Jan 25, 2016
Dataset provided by
figshare
Authors
Fabiano Belem
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Associative Tag Recommendation Exploiting Multiple Textual FeaturesFabiano Belem, Eder Martins, Jussara M. Almeida Marcos Goncalves In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, July. 2011AbstractThis work addresses the task of recommending relevant tags to a target object by jointly exploiting three dimen- sions of the problem: (i) term co-occurrence with tags preassigned to the target object, (ii) terms extracted from mul- tiple textual features, and (iii) several metrics of tag relevance. In particular, we propose several new heuristic meth- ods, which extend previous, highly effective and efficient, state-of-the-art strategies by including new metrics that try to capture how accurately a candidate term describes the object’s content. We also exploit two learning to rank techniques, namely RankSVM and Genetic Programming, for the task of generating ranking functions that combine multiple metrics to accurately estimate the relevance of a tag to a given object. We evaluate all proposed methods in various scenarios for three popular Web 2.0 applications, namely, LastFM, YouTube and YahooVideo. We found that our new heuristics greatly outperform the methods on which they are based, producing gains in precision of up to 181%, as well as another state-of-the-art technique, with improvements in precision of up to 40% over the best baseline in any scenario. Some further improvements can also be achieved, in some scenarios, with the new learning-to-rank based strategies, which have the additional advantage of being quite flexible and easily extensible to exploit other aspects of the tag recommendation problem.Bibtex Citation@inproceedings{belem@sigir11, author = {Fabiano Bel\'em and Eder Martins and Jussara Almeida and Marcos Gon\c{c}alves}, title = {Associative Tag Recommendation Exploiting Multiple Textual Features}, booktitle = {{Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR'11)}}, month = {{July}}, year = {2011} }
P
MLB-YouTube Dataset Dataset
paperswithcode.com
Updated Mar 23, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AJ Piergiovanni; Michael S. Ryoo (2021). MLB-YouTube Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/mlb-youtube-dataset
Explore at:
Dataset updated
Mar 23, 2021
Authors
AJ Piergiovanni; Michael S. Ryoo
Area covered
YouTube
Description
The MLB-YouTube dataset is a new, large-scale dataset consisting of 20 baseball games from the 2017 MLB post-season available on YouTube with over 42 hours of video footage. The dataset consists of two components: segmented videos for activity recognition and continuous videos for activity classification. It is quite challenging as it is created from TV broadcast baseball games where multiple different activities share the camera angle. Further, the motion/appearance difference between the various activities is quite small.
h
youtube-commons-asr-eval
huggingface.co
Updated Apr 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mobius Labs GmbH (2024). youtube-commons-asr-eval [Dataset]. https://huggingface.co/datasets/mobiuslabsgmbh/youtube-commons-asr-eval
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 29, 2024
Dataset authored and provided by
Mobius Labs GmbH
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
Dataset Card for youtube-commons-asr-eval

Dataset Summary

This evaluation dataset is created from a subset of Youtube-Commons [PleIAs/YouTube-Commons] by selecting English YouTube videos and corresponding english subtitle.

Supported Tasks and Leaderboards

This dataset will be primarily useful for automatic speech recognition evaluation tasks such as hf-audio/open_asr_leaderboard.

Languages

This subset is for English language evaluations.… See the full description on the dataset page: https://huggingface.co/datasets/mobiuslabsgmbh/youtube-commons-asr-eval.
YouTube NSI Captioning Dataset
zenodo.org
bin, csv
Updated Mar 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lloyd May; Lloyd May; Keita Ohshiro; Keita Ohshiro; Khang Dang; Khang Dang; Sripathi Sridhar; Sripathi Sridhar; Jhanvi Pai; Jhanvi Pai; Magdalena Fuentes; Magdalena Fuentes; Sooyeon Lee; Sooyeon Lee; Mark Cartwright; Mark Cartwright (2024). YouTube NSI Captioning Dataset [Dataset]. http://doi.org/10.5281/zenodo.10681804
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10681804
Dataset updated
Mar 1, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lloyd May; Lloyd May; Keita Ohshiro; Keita Ohshiro; Khang Dang; Khang Dang; Sripathi Sridhar; Sripathi Sridhar; Jhanvi Pai; Jhanvi Pai; Magdalena Fuentes; Magdalena Fuentes; Sooyeon Lee; Sooyeon Lee; Mark Cartwright; Mark Cartwright
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
Version 1.0, March 2024

Created by

Lloyd May (1), Keita Ohshiro (2,3), Khang Dang (2,3), Sripathi Sridhar (2,3), Jhanvi Pai (2,3), Magdalena Fuentes (4), Sooyeon Lee (3), Mark Cartwright (2,3,4)

Center for Computer Research in Music and Acoustics, Stanford University

Sound Interaction and Computing Lab, New Jersey Institute of Technology

Department of Informatics, New Jersey Institute of Technology

Music and Audio Research Lab, New York University

Publication

If using this data in an academic work, please reference the DOI and version, as well as cite the following paper, which presented the data collection procedure and the first version of the dataset:

May, L., Ohshiro, K., Dang, K., Sridhar, S., Pai, J., Fuentes, M., Lee, S., Cartwright, M. Unspoken Sound: Identifying Trends in Non-Speech Audio Captioning on YouTube. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI), 2024.

Description

The YouTube NSI Captioning Dataset was developed to analyze the contemporary and historical state of non-speech information (NSI) captioning on YouTube. NSI includes information about non-speech sounds such as environmental sounds, sound effects, incidental sounds, and music, as well as additional narrative information and extra-speech information (ESI), which gives context to spoken or signed language such as manner of speech (e.g. "[Whispering] Oh no") or speaker label (e.g., "[Juan] Oh no"). The dataset contains measures of estimated and annotated NSI in the captions of two different samples of videos: a popular video sample and a studio video sample. The aim of the popular sample is to understand the captioning practices in a broad spectrum of popular, impactful videos on YouTube. In contrast, the aim of the studio sample is to examine captioning practices among the top-tier production houses, often viewed as industry benchmarks due to their influence and vast resources available for accessibility. Using the YouTube API, we queried for videos in these two samples for each month from 2013 to 2022. We then estimated which captions contain NSI by searching for non-alphanumeric symbols that are indicative of NSI, e.g., "[" and "]" (see Section 3.2 of the paper for a full list). In addition, the research team manually annotated which captions have NSI from a subset of approximately 1800 videos from years 2013, 2018, and 2022. Please see the Section 3.3 of the paper for details of the annotation process.

The resulting YouTube NSI Captioning Dataset consists of NSI information from ~715k videos containing ~273M lines of captions, ~ 6M of which are estimated instances of NSI. These videos span 10 years and 21 topics. The annotated subset consists of 1799 videos with a total of ~36k annotated captions lines, ~114k of which are instances of NSI annotated on 7 different categories. These videos span 3 years (2013, 2018, and 2022) and 20 YouTube-assigned topics. Each video was annotated by two annotators along with the consensus annotation. The dataset contains the links to the YouTube videos, video metadata from the YouTube API, and measures of both estimated and annotated NSI. Due to copyright concerns, we are only publicly releasing data consisting of summary NSI measures for each video. If you need access to the raw data used to create these summary NSI measures, contact Mark Cartwright at mark.cartwright@njit.edu.

Files

estimated_full_set_aggregate.csv : Data file containing the full set of video data with measures of estimated NSI.

annotated_subset_aggregate.csv : Data file containing the smaller annotated subset of video data with measures of both annotated and estimated NSI.

Columns

The following columns are present in both data files.

video_id : The YouTube video ID

year : The year associated with the time period from which the video was sampled.

sample : The sample which the video is from (i.e., popular or studio)

sampling_period_start_date : The start date of the time period from which the video was sampled.

sampling_period_end_date : The end date of the time period from which the video was sampled.

caption_type : This can take one of three values: auto which indicates a caption was provided by YouTube's automated caption system, manual which indicates a caption was provided by the uploader, or none which indicates that no captions are present for the video.

duration_minutes : The duration of the video in minutes.

channel_id : The ID that YouTube uses to uniquely identify the channel.

published_datetime : The date and time at which the video was published on YouTube.

youtube_topics : The YouTube-provided list of Wikipedia URLs that provide a description of the video's content.

category_id : The YouTube video category associated with the video.

view_count : The count of views on YouTube at the time of sampling (Spring 2023).

like_count : The count of likes on YouTube at the time of sampling (Spring 2023).

comment_count : The count of comments on YouTube at the time of sampling (Spring 2023).

high_level_topics : List of topics at a higher semantic level than youtube_topics that provide a description of the video's content. See paper for details on the mapping between youtube_topics and high_level_topics.

: The remainder of the columns take this form with the values listed below.

Values for :

estimated_nsi : This NSI type is an estimation of NSI based on the presence of particular non-alphanumeric characters that are indicative of NSI as described in Section 3.2 of the paper.

general_nsi (only in annotated_subset_aggregate.csv) : The most general of NSI types that is inclusive of music_nsi, environmental_nsi, additionalnarrativ_nsi, and quotedspeech_nsi. All of these NSI types are included in the calculation of measures associated with general_nsi. Note that misc_nsi and nonenglish_captions are not included as those may or may not contain NSI, and thus, we opt for precision over recall. Not present for the unlabeled

music_nsi (only in annotated_subset_aggregate.csv) : Any genre of music, whether diegetic or not.

environmental_nsi (only in annotated_subset_aggregate.csv) : Environmental sounds, sound effects, and incidental sounds, i.e., non-music and non-speech sounds. This includes non-verbal vocalizations like laughter, grunts, and crying, provided they aren't used to modify speech.

extraspeech_nsi (only in annotated_subset_aggregate.csv) : Extra-speech Information (ESI), i.e., text that gives added context to spoken or signed language.

additionalnarrative_nsi (only in annotated_subset_aggregate.csv) : Additional narrative information in the form of descriptive text that doesn't pertain directly to sounds.

quotedspeech_nsi (only in annotated_subset_aggregate.csv) : Quoted Speech Captions containing internal quotation marks.

misc_nsi (only in annotated_subset_aggregate.csv) : Unsure, misc, or ambiguous, i.e., instances where the appropriate label is unclear or the caption doesn't fit current categories.

nonenglish_captions (only in annotated_subset_aggregate.csv) : Captions not written in English and thus have uncertain NSI status.

Values for :

count : The number of captions identified as containing NSI of the specified type in the video.

presence : Indication of whether there is NSI of the specified type present in the video. 1 if present (e.g., count > 0), 0 if not present (e.g., count==0).

count_per_minute : A measure of the density of NSI captions. count_per_min = count / duration_minutes

count_per_minute_if_present : If presence==1, then count_per_minute, else, NaN. This is used for computing the aggregate CPMIP measure, which as discussed in the paper is intended to be a measure of the quality of NSI captions based on the assumption that more frequently captioned NSI within a video is an indicator of better NSI captioning. See Section 5 of the paper for details.

Conditions of use

Dataset created by Lloyd May, Keita Ohshiro, Khang Dang, Sripathi Sridhar, Jhanvi Pai, Magdalena Fuentes, Sooyeon Lee, and Mark Cartwright

The YouTube NSI Captioning Dataset dataset is offered free of charge under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license: https://creativecommons.org/licenses/by/4.0/

Feedback

Please help us improve YouTube NSI Captioning Dataset by sending your feedback to:

Mark
Z
Data from: EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition...
data.niaid.nih.gov
opendatalab.com
Updated Aug 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Doh, Seungheon (2021). EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5090630
Explore at:
Dataset updated
Aug 26, 2021
Dataset provided by
Kim, Nabin
Ching, Joann
Yang, Yi-Hsuan
Doh, Seungheon
Hung, Hsiao-Tzu
Nam, Juhan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
EMOPIA (pronounced ‘yee-mò-pi-uh’) dataset is a shared multi-modal (audio and MIDI) database focusing on perceived emotion in pop piano music, to facilitate research on various tasks related to music emotion. The dataset contains 1,087 music clips from 387 songs and clip-level emotion labels annotated by four dedicated annotators.

For more detailed information about the dataset, please refer to our paper: EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation.

File Description

midis/: midi clips transcribed using GiantMIDI.

Filename Q1_xxxxxxx_2.mp3: Q1 means this clip belongs to Q1 on the V-A space; xxxxxxx is the song ID on YouTube, and the 2 means this clip is the 2nd clip taken from the full song.

metadata/: metadata from YouTube. (Got when crawling)

songs_lists/: YouTube URLs of songs.

tagging_lists/: raw tagging result for each sample.

label.csv: metadata that records filename, 4Q label, and annotator.

metadata_by_song.csv: list all the clips by the song. Can be used to create the train/val/test splits to avoid the same song appear in both train and test.

scripts/prepare_split.ipynb: the script to create train/val/test splits and save them to csv files.

2.2 Update

Add tagging files in tagging_lists/ that are missing in the previous version.

Add timestamps.json for easier usage. It records all the timestamps in dict format. You can see scripts/load_timestamp.ipynb for the format example.

Add scripts/timestamp2clip.py: After the raw audio are crawled and put in audios/raw, you can use this script to get audio clips. The script will read timestamps.json and use the timestamp to extract clips. The clips will be saved to audios/seg folder.

remove 7 midi files that were added by mistake, and also corrected the number in metadata_by_song.csv.

2.1 Update

Add one file and one folder:

key_mode_tempo.csv: key, mode, and tempo information extracted from files.

CP_events/: CP events used in our paper. Extracted using this script, and add the emotion event to the front.

Modify one folder:

The REMI_events/ files in version 2.0 contain some information that is not related to the paper, so remove it.

2.0 Update

Add two new folders:

corpus/: processed data that following the preprocessing flow. (Please notice that although we have 1078 clips in our dataset, we lost some clips during steps 1~4 of the flow, so the final number of clips in this corpus is 1052, and that's the number we used for training the generative model.)

REMI_events/: REMI event for each midi file. They are generated using this script.

Cite this dataset

@inproceedings{{EMOPIA}, author = {Hung, Hsiao-Tzu and Ching, Joann and Doh, Seungheon and Kim, Nabin and Nam, Juhan and Yang, Yi-Hsuan}, title = {{MOPIA}: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation}, booktitle = {Proc. Int. Society for Music Information Retrieval Conf.}, year = {2021} }
P
Kinetics-600 Dataset
paperswithcode.com
opendatalab.com
Updated Apr 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joao Carreira; Eric Noland; Andras Banki-Horvath; Chloe Hillier; Andrew Zisserman (2021). Kinetics-600 Dataset [Dataset]. https://paperswithcode.com/dataset/kinetics-600
Explore at:
Dataset updated
Apr 21, 2021
Authors
Joao Carreira; Eric Noland; Andras Banki-Horvath; Chloe Hillier; Andrew Zisserman
Description
The Kinetics-600 is a large-scale action recognition dataset which consists of around 480K videos from 600 action categories. The 480K videos are divided into 390K, 30K, 60K for training, validation and test sets, respectively. Each video in the dataset is a 10-second clip of action moment annotated from raw YouTube video. It is an extensions of the Kinetics-400 dataset.
Physical Exercise Recognition Dataset
kaggle.com
Updated Feb 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhannad Tuameh (2023). Physical Exercise Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/muhannadtuameh/exercise-recognition
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Muhannad Tuameh
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Note:

Because this dataset has been used in a competition, we had to hide some of the data to prepare the test dataset for the competition. Thus, in the previous version of the dataset, only train.csv file is existed.

Content

This dataset represents 10 different physical poses that can be used to distinguish 5 exercises. The exercises are Push-up, Pull-up, Sit-up, Jumping Jack and Squat. For every exercise, 2 different classes have been used to represent the terminal positions of that exercise (e.g., “up” and “down” positions for push-ups).

Collection Process

About 500 videos of people doing the exercises have been used in order to collect this data. The videos are from Countix Dataset that contain the YouTube links of several human activity videos. Using a simple Python script, the videos of 5 different physical exercises are downloaded. From every video, at least 2 frames are manually extracted. The extracted frames represent the terminal positions of the exercise.

Processing Data

For every frame, MediaPipe framework is used for applying pose estimation, which detects the human skeleton of the person in the frame. The landmark model in MediaPipe Pose predicts the location of 33 pose landmarks (see figure below). Visit Mediapipe Pose Classification page for more details.

https://mediapipe.dev/images/mobile/pose_tracking_full_body_landmarks.png" alt="33 pose landmarks">
YouTube users in Africa 2020-2029
statista.com
Updated Feb 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). YouTube users in Africa 2020-2029 [Dataset]. https://www.statista.com/topics/9813/internet-usage-in-africa/
Explore at:
Dataset updated
Feb 15, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
Africa
Description
The number of Youtube users in Africa was forecast to continuously increase between 2024 and 2029 by in total 0.03 million users (+3.95 percent). The Youtube user base is estimated to amount to 0.79 million users in 2029. User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Worldwide and the Americas.
Z
NII Face Mask Dataset
data.niaid.nih.gov
Updated Jan 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Junichi Yamagishi (2022). NII Face Mask Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5761724
Explore at:
Dataset updated
Jan 26, 2022
Dataset provided by
Khanh-Duy Nguyen
Junichi Yamagishi
Isao Echizen
Trung-Nghia Le
Huy H. Nguyen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
=====================================================================

NII Face Mask Dataset v1.0

=====================================================================

Authors: Trung-Nghia Le (1), Khanh-Duy Nguyen (2), Huy H. Nguyen (1), Junichi Yamagishi (1), Isao Echizen (1)

Affiliations: (1)National Institute of Informatics, Japan (2)University of Information Technology-VNUHCM, Vietnam

National Institute of Informatics Copyright (c) 2021

Emails: {ltnghia, nhhuy, jyamagis, iechizen}@nii.ac.jp, {khanhd}@uit.edu.vn

Arxiv: https://arxiv.org/abs/2111.12888 NII Face Mask Dataset v1.0: https://zenodo.org/record/5761725

=============================== INTRODUCTION ===============================

The NII Face Mask Dataset is the first large-scale dataset targeting mask-wearing ratio estimation in street cameras. This dataset contains 581,108 face annotations extracted from 18,088 video frames (1920x1080 pixels) in 17 street-view videos obtained from the Rambalac's YouTube channel.

https://www.youtube.com/c/Rambalac

The videos were taken in multiple places, at various times, before and during the COVID-19 pandemic. The total length of the videos is approximately 56 hours.

=============================== REFERENCES ===============================

If your publish using any of the data in this dataset please cite the following papers:

Pre-print version

@article{Nguyen202112888, title={Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio}, author={Nguyen, Khanh-Duy and Nguyen, Huy H and Le, Trung-Nghia and Yamagishi, Junichi and Echizen, Isao}, archivePrefix={arXiv}, arxivId={2111.12888}, url={https://arxiv.org/abs/2111.12888}, year={2021} }

Final version

@INPROCEEDINGS{Nguyen2021EstMaskWearing, author={Nguyen, Khanh-Duv and Nguyen, Huv H. and Le, Trung-Nghia and Yamagishi, Junichi and Echizen, Isao}, booktitle={2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)}, title={Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio}, year={2021}, pages={1-8}, url={https://ieeexplore.ieee.org/document/9667046}, doi={10.1109/FG52635.2021.9667046}}

======================== DATA STRUCTURE ==================================

1. Directory Structure

./NFM ├── dataset │ ├── train.csv: annotations for the train set. │ ├── test.csv: annotations for the test set. └── README_v1.0.md

2. Description for each files in detail.

We use the same structure for two CSV files (train.csv and test.csv). Both CSV files have the same columns: <1st column>: video_id (a source video can be found by following the link: https://www.youtube.com/watch?v=) <2nd column>: frame_id (the index of a frame extracted from the source video) <3rd column>: timestamp in milisecond (the timestamp of a frame extracted from the source video) <4th column>: label (for each annotated face, one of three labels was attached with a bounding box: 'Mask'/'No-Mask'/'Unknown') <5th column>: left <6th column>: top <7th column>: right <8th column>: bottom Four coordinates (left, top, right, bottom) were used to denote a face's bounding box.

============================== COPYING ================================

This repository is made available under Creative Commons Attribution License (CC-BY).

Regarding Creative Commons License: Attribution 4.0 International (CC BY 4.0), please see https://creativecommons.org/licenses/by/4.0/

THIS DATABASE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DATABASE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE

====================== ACKNOWLEDGEMENTS ================================

This research was partly supported by JSPS KAKENHI Grants (JP16H06302, JP18H04120, JP21H04907, JP20K23355, JP21K18023), and JST CREST Grants (JPMJCR20D3, JPMJCR18A6), Japan.

This dataset is based on the Rambalac's YouTube channel: https://www.youtube.com/c/Rambalac
YouTube-ASMR-300K
zenodo.org
data.niaid.nih.gov
zip
Updated Jun 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karren Yang; Bryan Russell; Justin Salamon; Karren Yang; Bryan Russell; Justin Salamon (2020). YouTube-ASMR-300K [Dataset]. http://doi.org/10.5281/zenodo.3889168
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3889168
Dataset updated
Jun 12, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Karren Yang; Bryan Russell; Justin Salamon; Karren Yang; Bryan Russell; Justin Salamon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
The YouTube-ASMR dataset contains URLS for over 900 hours of ASMR video clips with stereo/binaural audio produced by various YouTube artists. The following paper contains a detailed description of the dataset and how it was compiled:

K. Yang, B. Russell and J. Salamon, "Telling Left from Right: Learning Spatial Correspondence of Sight and Sound", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Conference, June 2020.
AIDERv2 (Aerial Image Dataset for Emergency Response Applications)
zenodo.org
zip
Updated Sep 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Demetris Shianios; Demetris Shianios; Christos Kyrkou; Christos Kyrkou (2024). AIDERv2 (Aerial Image Dataset for Emergency Response Applications) [Dataset]. http://doi.org/10.5281/zenodo.10891054
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10891054
Dataset updated
Sep 3, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Demetris Shianios; Demetris Shianios; Christos Kyrkou; Christos Kyrkou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SUMMARY OF DATASET

• This dataset consist of 167723 aerial images divided into 4 classes.

• The dataset contains three commonly occurring natural disasters

earthquake/collapsed buildings, flood, wildfire/fire, and a normal class; do not reflect any disaster

• The images can be loaded as numpy arrays using Python programming language and then used to train a Convolutional Neural Network to detect natural disasters from aerial images.

• The images are resized to 224x224x3 (heighty,width,channel number) when loaded as numpy arrays.

• The dataset is an extension of the AIDER dataset (Aerial Image Dataset for Emergency Response Applications).

• Additional images were collected by open source databases and extracted images as frames of videos downloaded from YouTube.

The table below shows the number of images in each set.

Train Validation Test Total

Earthquakes 1927 239 239 2405

Floods 4063 505 502 5070

Fire 3509 439 436 4384

Normal 3900 487 477 4864

Total 13399 1670 1654 16723

If you use this dataset please cite the following publications:

[1] Shianios, D., Kyrkou, C., Kolios, P.S. (2023). A Benchmark and Investigation of Deep-Learning-Based Techniques for Detecting Natural Disasters in Aerial Images. In: Tsapatsoulis, N., et al. Computer Analysis of Images and Patterns. CAIP 2023. Lecture Notes in Computer Science, vol 14185. Springer, Cham. https://doi.org/10.1007/978-3-031-44240-7_24

Link: https://link.springer.com/chapter/10.1007/978-3-031-44240-7_24

[2] D. Shianios, P. Kolios, C. Kyrkou, "DiRecNetV2: A Transformer-Enhanced Network for Aerial Disaster Recognition", SN Computer Science, 2024 (Accepted to Appear)

DATASET FOLDERS FORMAT

└───data

│ │

│ └───Dataset_Images

│ │ └───Train

│ │ │ | └───Earthquake

│ │ │ | img (1).jpg

│ │ │ | img (2).jpg

│ │ │ | .....

│ │ │ | └───Flood

│ │ │ | img (1).jpg

│ │ │ | img (2).jpg

│ │ │ | .....

│ │ │ | └───Normal

│ │ │ | img (1).jpg

│ │ │ | img (2).jpg

│ │ │ | .....

│ │ │ | └───Wildfire

│ │ │ | img (1).jpg

│ │ │ | img (2).jpg

│ │ │ | .....

│ │ └───Val

│ │ │ | └───Earthquake

│ │ │ | img (1).jpg

│ │ │ | img (2).jpg

│ │ │ | .....

│ │ │ | └───Flood

│ │ │ | img (1).jpg

│ │ │ | img (2).jpg

│ │ │ | .....

│ │ │ | └───Normal

│ │ │ | img (1).jpg

│ │ │ | img (2).jpg

│ │ │ | .....

│ │ │ | └───Wildfire

│ │ │ | img (1).jpg

│ │ │ | img (2).jpg

│ │ │ | .....

│ │ └───Test

│ │ │ | └───Earthquake

│ │ │ | img (1).jpg

│ │ │ | img (2).jpg

│ │ │ | .....

│ │ │ | └───Flood

│ │ │ | img (1).jpg

│ │ │ | img (2).jpg

│ │ │ | .....

│ │ │ | └───Normal

│ │ │ | img (1).jpg

│ │ │ | img (2).jpg

│ │ │ | .....

│ │ │ | └───Wildfire

│ │ │ | img (1).jpg

│ │ │ | img (2).jpg

│ │ │ | .....

DATA SOURCES AND DATA COLLECTION

OPEN SOURCE DATABASES

└───AIDER

https://zenodo.org/record/3888300#.Yuu11nZBxD-

Kyrkou, C. and Theocharides, T., 2020. EmergencyNet: Efficient aerial image classification for drone-based emergency monitoring using atrous convolutional feature fusion. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, pp.1687-1699.

└───ERA

https://lcmou.github.io/ERA_Dataset/

Mou, L., Hua, Y., Jin, P. and Zhu, X.X., 2020. Era: A data set and deep learning benchmark for event recognition in aerial videos [software and data sets]. IEEE Geoscience and Remote Sensing Magazine, 8(4), pp.125-133.

@article{eradataset,

title = {{ERA: A dataset and deep learning benchmark for event recognition in aerial videos}},

author = {Mou, L. and Hua, Y. and Jin, P. and Zhu, X. X.},

journal = {IEEE Geoscience and Remote Sensing Magazine},

year = {in press}

}

└───ISBDA

https://drive.google.com/file/d/1kEKJ8kr1aScXz_1El7Mn-Yi0ANducQIW/view

Zhu, X., Liang, J. and Hauptmann, A., 2021. Msnet: A multilevel instance segmentation network for natural disaster damage assessment in aerial videos. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2023-2032).

@misc{zhu2020msnet,

title={MSNet: A Multilevel Instance Segmentation Network for Natural Disaster Damage Assessment in Aerial Videos},

author={Xiaoyu Zhu and Junwei Liang and Alexander Hauptmann},

year={2020},

eprint={2006.16479},

archivePrefix={arXiv},

primaryClass={cs.CV}

}

└───Floods 2013

https://github.com/cvjena/eu-flood-dataset

Barz, B., Schröter, K., Münch, M., Yang, B., Unger, A., Dransch, D. and Denzler, J., 2019. Enhancing flood impact analysis using interactive retrieval of social media images. arXiv preprint arXiv:1908.03361.

@article{barz2019enhancing,

title={Enhancing flood impact analysis using interactive retrieval of social media images},

author={Barz, Bj{\"o}rn and Schr{\"o}ter, Kai and M{\"u}nch, Moritz and Yang, Bin and Unger, Andrea and Dransch, Doris and Denzler, Joachim},

journal={arXiv preprint arXiv:1908.03361},

year={2019}

}

└───Wildfire Research

http://wildfire.fesb.hr/index.php?option=com_content&view=article&id=58&Itemid=54

└───PyImages

https://drive.google.com/file/d/1NvTyhUsrFbL91E10EPm38IjoCg6E2c6q/view

The dataset was curated by PyImageSearch reader, Gautam Kumar.

YOUTUBE VIDEOS

└───Collapsed Buildings/Earthquakes

• https://www.youtube.com/watch?v=TMow3WPcZrQ&t=133s&ab_channel=GORKHALYFOUNDATION

• https://www.youtube.com/watch?v=_HT0tYKKjBI&t=47s&ab_channel=Effect.org

• https://www.youtube.com/watch?v=rkb3y6K3waU

• https://www.youtube.com/watch?v=yir6ArRZY4o&t=109s&ab_channel=UnicefUK

• https://www.youtube.com/watch?v=CM9APmIR9Fk&ab_channel=ToonsZilla

• https://www.youtube.com/watch?v=tmx2w6drAeU&ab_channel=AssociatedPress

• https://www.youtube.com/watch?v=kuSEe8Emwrk&ab_channel=BloombergQuicktake%3ANow

• https://www.youtube.com/watch?v=qoFHA3-m5ag&ab_channel=NBCNews

• https://www.youtube.com/watch?v=MM3PToqEPhQ&ab_channel=GuardianNews

• https://www.youtube.com/watch?v=zB_-TRnGuZE&ab_channel=DISASTERNEWS

• https://www.youtube.com/watch?v=TqAMQQOEsBs&ab_channel=WHAS11

• https://www.youtube.com/watch?v=0ixjTt-jmok&ab_channel=EveningStandard

• https://www.youtube.com/watch?v=bNGA8Ms3d70&ab_channel=CatersClips

• https://www.youtube.com/watch?v=wJ-2d5t23Lg&ab_channel=DailyDose

• https://www.youtube.com/watch?v=ewUcI7I6Gf4&ab_channel=NBCNews

• https://www.youtube.com/watch?v=Wx1cjOdlMZ4&ab_channel=ABCNews%28Australia%29

• https://www.youtube.com/watch?v=jiMK_sVmbXk&t=12s&ab_channel=NewChinaTV

• https://www.youtube.com/watch?v=M9au_9A2YRo&ab_channel=GuardianNews

• https://www.youtube.com/watch?v=i6Lh8IXPjso&ab_channel=TheSun

• https://www.youtube.com/watch?v=CKwxEr3I4Y8&ab_channel=GuardianNews

• https://www.youtube.com/watch?v=hxqzcajBCNg&list=RDCMUCD3KREyo3IqCLBC-4khGgIw&index=3&ab_channel=WXChasing

• https://www.youtube.com/watch?v=2GEeTDuf9mI&list=RDCMUCD3KREyo3IqCLBC-4khGgIw&index=6&ab_channel=WXChasing

•

EOAD (Egocentric Outdoor Activity Dataset)

zenodo.org
data.niaid.nih.gov

csv, png

Updated Jul 12, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Mehmet Ali Arabacı; Mehmet Ali Arabacı; Elif Surer; Elif Surer; Alptekin Temizel; Alptekin Temizel (2024). EOAD (Egocentric Outdoor Activity Dataset) [Dataset]. http://doi.org/10.5281/zenodo.7742660

Explore at:

csv, pngAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7742660

Dataset updated

Jul 12, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Mehmet Ali Arabacı; Mehmet Ali Arabacı; Elif Surer; Elif Surer; Alptekin Temizel; Alptekin Temizel

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

EOAD is a collection of videos captured by wearable cameras, mostly of sports activities. It contains both visual and audio modalities.

It was initiated by the HUJI and FPVSum egocentric activity datasets. However, the number of samples and diversity of activities for HUJI and FPVSum were insufficient. Therefore, we combined these datasets and populated them with new YouTube videos.

The selection of videos was based on the following criteria:

The videos should not include text overlays.
The videos should contain natural sound (no external music)
The actions in videos should be continuous (no cutting the scene or jumping in time)

Video samples were trimmed depending on scene changes for long videos (such as driving, scuba diving, and cycling). As a result, a video may have several clips depicting egocentric actions. Hence, video clips were extracted from carefully defined time intervals within videos. The final dataset includes video clips with a single action and natural audio information.

Statistics for EOAD:

30 activities
303 distinct videos
1392 video clips
2243 minutes labeled videos clips

The detailed statistics for the selected datasets and the crawled videos clips from YouTube are given below:

HUJI: 49 distinct videos - 148 video clips for 9 activities (driving, biking, motorcycle, walking, boxing, horse riding, running, skiing, stair climbing)
FPVSum: 39 distinct videos - 124 video segments for 8 activities (biking, horse riding, skiing, longboarding, rock climbing, scuba, skateboarding, surfing)
YouTube: 216 distinct videos - 1120 video clips for 27 activities (american football, basketball, bungee jumping, driving, go-kart, horse riding, ice hockey, jet ski, kayaking, kitesurfing, longboarding, motorcycle, paintball, paragliding, rafting, rock climbing, rowing, running, sailing, scuba diving, skateboarding, soccer, stair climbing, surfing, tennis, volleyball, walking)

The video clips used for training, validation and test sets for each activity are listed in Table 1. Multiple video clips may belong to a single video because of trimming it for some reasons (i.e., scene cut, temporary overlayed text on videos, or video parts unrelated to activities).

While splitting the dataset, the minimum number of videos for each activity was selected as 8. Additionally, the video samples were divided as 50%, 25%, and 25% for training (minimum four videos), validation (minimum two videos), and testing (minimum two videos), respectively. On the other hand, videos were split according to the raw video footage to prevent the mixing of similar video clips (having the same actors and scenes) into training, validation, and test sets. Therefore, we ensured that the video clips trimmed from the same videos were split together into training, validation, or test sets to satisfy a fair comparison.

Some activities have continuity throughout the video, such as scuba, longboarding, or riding horse, which also have an equal number of video segments with the number of videos. However, some activities, such as skating, occurred in a short time, making the number of video segments higher than the others. As a result, the number of video clips for training, validation, and test sets was highly imbalanced for the selected activities (i.e., jet ski and rafting have 4; however, soccer has 99 video clips for training).

Table 1 - Dataset splitting for EOAD

	Train		Validation		Test
Action Label	#Clips	Total Duration	#Clips	Total Duration	#Clips	Total Duration
AmericanFootball	34	00:06:09	36	00:05:03	9	00:01:20
Basketball	43	01:13:22	19	00:08:13	10	00:28:46
Biking	9	01:58:01	6	00:32:22	11	00:36:16
Boxing	7	00:24:54	11	00:14:14	5	00:17:30
BungeeJumping	7	00:02:22	4	00:01:36	4	00:01:31
Driving	19	00:37:23	9	00:24:46	9	00:29:23
GoKart	5	00:40:00	3	00:11:46	3	00:19:46
Horseback	5	01:15:14	5	01:02:26	2	00:20:38
IceHockey	52	00:19:22	46	00:20:34	10	00:36:59
Jetski	4	00:23:35	5	00:18:42	6	00:02:43
Kayaking	28	00:43:11	22	00:14:23	4	00:11:05
Kitesurfing	30	00:21:51	17	00:05:38	6	00:01:32
Longboarding	5	00:15:40	4	00:18:03	4	00:09:11
Motorcycle	20	00:49:38	21	00:13:53	8	00:20:30
Paintball	7	00:33:52	4	00:12:08	4	00:08:52
Paragliding	11	00:28:42	4	00:10:16	4	00:19:50

R
Accident Detection Model Dataset
universe.roboflow.com
zip
Updated Apr 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Accident detection model (2024). Accident Detection Model Dataset [Dataset]. https://universe.roboflow.com/accident-detection-model/accident-detection-model/model/1
Explore at:
zipAvailable download formats
Dataset updated
Apr 8, 2024
Dataset authored and provided by
Accident detection model
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Accident Bounding Boxes
Description
Accident-Detection-Model

Accident Detection Model is made using YOLOv8, Google Collab, Python, Roboflow, Deep Learning, OpenCV, Machine Learning, Artificial Intelligence. It can detect an accident on any accident by live camera, image or video provided. This model is trained on a dataset of 3200+ images, These images were annotated on roboflow.

Problem Statement

Road accidents are a major problem in India, with thousands of people losing their lives and many more suffering serious injuries every year.

According to the Ministry of Road Transport and Highways, India witnessed around 4.5 lakh road accidents in 2019, which resulted in the deaths of more than 1.5 lakh people.

The age range that is most severely hit by road accidents is 18 to 45 years old, which accounts for almost 67 percent of all accidental deaths.

Accidents survey

https://user-images.githubusercontent.com/78155393/233774342-287492bb-26c1-4acf-bc2c-9462e97a03ca.png" alt="Survey">

Literature Survey

Sreyan Ghosh in Mar-2019, The goal is to develop a system using deep learning convolutional neural network that has been trained to identify video frames as accident or non-accident.

Deeksha Gour Sep-2019, uses computer vision technology, neural networks, deep learning, and various approaches and algorithms to detect objects.

Research Gap

Lack of real-world data - We trained model for more then 3200 images.

Large interpretability time and space needed - Using google collab to reduce interpretability time and space required.

Outdated Versions of previous works - We aer using Latest version of Yolo v8.

Proposed methodology

We are using Yolov8 to train our custom dataset which has been 3200+ images, collected from different platforms.

This model after training with 25 iterations and is ready to detect an accident with a significant probability.

Model Set-up

Preparing Custom dataset

We have collected 1200+ images from different sources like YouTube, Google images, Kaggle.com etc.

Then we annotated all of them individually on a tool called roboflow.

During Annotation we marked the images with no accident as NULL and we drew a box on the site of accident on the images having an accident

Then we divided the data set into train, val, test in the ratio of 8:1:1

At the final step we downloaded the dataset in yolov8 format.
#### Using Google Collab

We are using google colaboratory to code this model because google collab uses gpu which is faster than local environments.

You can use Jupyter notebooks, which let you blend code, text, and visualisations in a single document, to write and run Python code using Google Colab.

Users can run individual code cells in Jupyter Notebooks and quickly view the results, which is helpful for experimenting and debugging. Additionally, they enable the development of visualisations that make use of well-known frameworks like Matplotlib, Seaborn, and Plotly.

In Google collab, First of all we Changed runtime from TPU to GPU.

We cross checked it by running command ‘!nvidia-smi’
#### Coding

First of all, We installed Yolov8 by the command ‘!pip install ultralytics==8.0.20’

Further we checked about Yolov8 by the command ‘from ultralytics import YOLO from IPython.display import display, Image’

Then we connected and mounted our google drive account by the code ‘from google.colab import drive drive.mount('/content/drive')’

Then we ran our main command to run the training process ‘%cd /content/drive/MyDrive/Accident Detection model !yolo task=detect mode=train model=yolov8s.pt data= data.yaml epochs=1 imgsz=640 plots=True’

After the training we ran command to test and validate our model ‘!yolo task=detect mode=val model=runs/detect/train/weights/best.pt data=data.yaml’ ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt conf=0.25 source=data/test/images’

Further to get result from any video or image we ran this command ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt source="/content/drive/MyDrive/Accident-Detection-model/data/testing1.jpg/mp4"’

The results are stored in the runs/detect/predict folder.
Hence our model is trained, validated and tested to be able to detect accidents on any video or image.

Challenges I ran into

I majorly ran into 3 problems while making this model

I got difficulty while saving the results in a folder, as yolov8 is latest version so it is still underdevelopment. so i then read some blogs, referred to stackoverflow then i got to know that we need to writ an extra command in new v8 that ''save=true'' This made me save my results in a folder.

I was facing problem on cvat website because i was not sure what
Z
Data from: Dataset "Privacy-aware image classification and search"
data.niaid.nih.gov
eprints.soton.ac.uk
Updated Oct 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siersdorfer, Stefan (2021). Dataset "Privacy-aware image classification and search" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4568970
Explore at:
Dataset updated
Oct 15, 2021
Dataset provided by
Siersdorfer, Stefan
Zerr, Sergej
Hare Jonathon
Demidova, Elena
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Modern content sharing environments such as Flickr or YouTube contain a large number of private resources such as photos showing weddings, family holidays, and private parties. These resources can be of a highly sensitive nature, disclosing many details of the users' private sphere. In order to support users in making privacy decisions in the context of image sharing and to provide them with a better overview of privacy-related visual content available on the Web, we propose techniques to automatically detect private images and to enable privacy-oriented image search. In order to classify images, we use the metadata like title and tags and plan to use visual features which are described in our scientific paper. The data set used in the paper is now available.

Picalet! cleaned dataset - ( recommended for experiments) userstudy - (images annotated with queries, anonymized user id and privacy value)
f
Table_2_Investigating the Role of Culture on Negative Emotion Expressions in...
frontiersin.figshare.com
xlsx
Updated Jun 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emma Hughson; Roya Javadi; James Thompson; Angelica Lim (2023). Table_2_Investigating the Role of Culture on Negative Emotion Expressions in the Wild.XLSX [Dataset]. http://doi.org/10.3389/fnint.2021.699667.s004
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fnint.2021.699667.s004
Dataset updated
Jun 8, 2023
Dataset provided by
Frontiers
Authors
Emma Hughson; Roya Javadi; James Thompson; Angelica Lim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Even though culture has been found to play some role in negative emotion expression, affective computing research primarily takes on a basic emotion approach when analyzing social signals for automatic emotion recognition technologies. Furthermore, automatic negative emotion recognition systems still train data that originates primarily from North America and contains a majority of Caucasian training samples. As such, the current study aims to address this problem by analyzing what the differences are of the underlying social signals by leveraging machine learning models to classify 3 negative emotions, contempt, anger and disgust (CAD) amongst 3 different cultures: North American, Persian, and Filipino. Using a curated data set compiled from YouTube videos, a support vector machine (SVM) was used to predict negative emotions amongst differing cultures. In addition a one-way ANOVA was used to analyse the differences that exist between each culture group in-terms of level of activation of underlying social signal. Our results not only highlighted the significant differences in the associated social signals that were activated for each culture, but also indicated the specific underlying social signals that differ in our cross-cultural data sets. Furthermore, the automatic classification methods showed North American expressions of CAD to be well-recognized, while Filipino and Persian expressions were recognized at near chance levels.
m
UrduSER: A Dataset for Urdu Speech Emotion Recognition
data.mendeley.com
Updated Apr 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Zaheer Akhtar (2025). UrduSER: A Dataset for Urdu Speech Emotion Recognition [Dataset]. http://doi.org/10.17632/jcpfjnk5c2.4
Explore at:
Unique identifier
https://doi.org/10.17632/jcpfjnk5c2.4
Dataset updated
Apr 28, 2025
Authors
Muhammad Zaheer Akhtar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Speech Emotion Recognition (SER) is a rapidly evolving field of research aimed at identifying and categorizing emotional states through the analysis of speech signals. As SER holds significant socio-cultural and commercial importance, researchers are increasingly leveraging machine learning and deep learning techniques to drive advancements in this domain. A high-quality dataset is an essential resource for SER studies in any language. Despite Urdu being the 10th most spoken language globally, there is a significant lack of robust SER datasets, creating a research gap. Existing Urdu SER datasets are often limited by their small size, narrow emotional range, and repetitive content, reducing their applicability in real-world scenarios. To address this gap, the Urdu Speech Emotion Recognition (UrduSER) was developed. This comprehensive dataset includes 3500 Urdu speech signals sourced from 10 professional actors, with an equal representation of male and female speakers from diverse age groups. The dataset encompasses seven emotional states: Angry, Fear, Boredom, Disgust, Happy, Neutral, and Sad. The speech samples were curated from a wide collection of Pakistani Urdu drama serials and telefilms available on YouTube, ensuring diversity and natural delivery. Unlike conventional datasets, which rely on predefined dialogs recorded in controlled environments, UrduSER features unique and contextually varied utterances, making it more realistic and applicable for practical applications. To ensure balance and consistency, the dataset contains 500 samples per emotional class, with 50 samples contributed by each actor for each emotion. Additionally, an accompanying Excel file provides detailed metadata for each recording, including the file name, duration, format, sample rate, actor details, emotional state, and corresponding Urdu dialog. This metadata enables researchers to efficiently organize and utilize the dataset for their specific needs. The UrduSER dataset underwent rigorous validation, integrating expert evaluation and model-based validation to ensure its reliability, accuracy, and overall suitability for advancing research and development in Urdu Speech Emotion Recognition.
f
UnusualAction Dataset
figshare.com
bin
Updated May 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nitika Nigam; Tanima Dutta; Hari Prabhat Gupta (2022). UnusualAction Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.19782529.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19782529.v1
Dataset updated
May 18, 2022
Dataset provided by
figshare
Authors
Nitika Nigam; Tanima Dutta; Hari Prabhat Gupta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
UnusualAction Dataset for Action Recognition Nitika Nigam, Tanima Dutta and Hari Prabhat Gupta Indian Institute of Technology (BHU), India. Overview: UnusualAction is an uncertain action recognition dataset that rarely happens and collected from YouTube. The dataset comprises 14 unusual action categories, and each category contains 50-100 videos. UnusualAction gives the diversity in terms of different falling actions and with the presence of noises, such as, variations in camera motions, person appearance, viewpoint, cluttered background, illumination conditions, etc. It is a challenging dataset for uncertain action recognition. Most action recognition datasets are based on certain actions; on the contrary, UnusualAction aims to encourage further research into uncertain action recognition by learning and exploring new realistic action categories. Structure for UnusualAction Dataset ● Data associated with each UnusualAction category is stored in separate directories. ● Each directory comprises *.mp4 or *.avi files for videos. ● The directory is arranged in the following structure: FallAction_datasets ├──Blending_phone ├── Crushing_laptop ├── Cutting_keyboard ├── Drilling_Laptop ├── Drilling_Phone ├── Frying_Phone ├── Hammering_Laptop ├── Hammering_phone ├── Hammering_pumpkin ├── Hammering_watermelon ├── Microwave_shoes ├──Microwave_phone ├── Washing_laptop ├── Washing_Paptop

Data from: TweetNERD - End to End Entity Linking Benchmark for Tweets

zenodo.org

bin, tsv

Updated Feb 3, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Shubhanshu Mishra; Shubhanshu Mishra; Aman Saini; Raheleh Makki; Sneha Mehta; Aria Haghighi; Ali Mollahosseini; Aman Saini; Raheleh Makki; Sneha Mehta; Aria Haghighi; Ali Mollahosseini (2023). TweetNERD - End to End Entity Linking Benchmark for Tweets [Dataset]. http://doi.org/10.5281/zenodo.6617192

Explore at:

tsv, binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6617192

Dataset updated

Feb 3, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Shubhanshu Mishra; Shubhanshu Mishra; Aman Saini; Raheleh Makki; Sneha Mehta; Aria Haghighi; Ali Mollahosseini; Aman Saini; Raheleh Makki; Sneha Mehta; Aria Haghighi; Ali Mollahosseini

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

TweetNERD - End to End Entity Linking Benchmark for Tweets

Paper - Video - Neurips Page

This is the dataset described in the paper TweetNERD - End to End Entity Linking Benchmark for Tweets (accepted to Thirty-sixth Conference on Neural Information Processing Systems (Neurips) Datasets and Benchmarks Track).

Named Entity Recognition and Disambiguation (NERD) systems are foundational for information retrieval, question answering, event detection, and other natural language processing (NLP) applications. We introduce TweetNERD, a dataset of 340K+ Tweets across 2010-2021, for benchmarking NERD systems on Tweets. This is the largest and most temporally diverse open sourced dataset benchmark for NERD on Tweets and can be used to facilitate research in this area.

TweetNERD dataset is released under Creative Commons Attribution 4.0 International (CC BY 4.0) LICENSE.

The license only applies to the data files present in this dataset. See Data usage policy below.

Check out more details at https://github.com/twitter-research/TweetNERD

Usage

We provide the dataset split across the following tab seperated files:

OOD.public.tsv: OOD split of the data in the paper.
Academic.public.tsv: Academic split of the data described in the paper.
part_*.public.tsv: Remaining data split into parts in no particular order.

Each file is tab separated and has has the following format:

tweet_id	phrase	start	end	entityId	score
22	twttr	20	25	Q918	3
21	twttr	20	25	Q918	3
1457198399032287235	Diwali	30	38	Q10244	3
1232456079247736833	NO_PHRASE	-1	-1	NO_ENTITY	-1

For tweets which don't have any entity, their column values for phrase, start, end, entityId, score are set NO_PHRASE, -1, -1, NO_ENTITY, -1 respectively.

Description of file columns is as follows:

Column	Type	Missing Value	Description
tweet_id	string		ID of the Tweet
phrase	string	NO_PHRASE	entity phrase
start	int	-1	start offset of the phrase in text using `UTF-16BE` encoding
end	int	-1	end offset of the phrase in the text using `UTF-16BE` encoding
entityId	string	NO_ENTITY	Entity ID. If not missing can be NOT FOUND, AMBIGUOUS, or Wikidata ID of format Q{numbers}, e.g. Q918
score	int	-1	Number of annotators who agreed on the phrase, start, end, entityId information

In order to use the dataset you need to utilize the tweet_id column and get the Tweet text using the Twitter API (See Data usage policy section below).

Data stats

Split	Number of Rows	Number unique tweets
OOD	34102	25000
Academic	51685	30119
part_0	11830	10000
part_1	35681	25799
part_2	34256	25000
part_3	36478	25000
part_4	37518	24999
part_5	36626	25000
part_6	34001	24984
part_7	34125	24981
part_8	32556	25000
part_9	32657	25000
part_10	32442	25000
part_11	32033	24972

Data usage policy

Use of this dataset is subject to you obtaining lawful access to the Twitter API, which requires you to agree to the Developer Terms Policies and Agreements.

Please cite the following if you use TweetNERD in your paper:

@dataset{TweetNERD_Zenodo_2022_6617192,
 author    = {Mishra, Shubhanshu and
         Saini, Aman and
         Makki, Raheleh and
         Mehta, Sneha and
         Haghighi, Aria and
         Mollahosseini, Ali},
 title    = {{TweetNERD - End to End Entity Linking Benchmark 
          for Tweets}},
 month    = jun,
 year     = 2022,
 note     = {{Data usage policy Use of this dataset is subject 
          to you obtaining lawful access to the [Twitter
          API](https://developer.twitter.com/en/docs
          /twitter-api), which requires you to agree to the
          [Developer Terms Policies and
          Agreements](https://developer.twitter.com/en
          /developer-terms/).}},
 publisher  = {Zenodo},
 version   = {0.0.0},
 doi     = {10.5281/zenodo.6617192},
 url     = {https://doi.org/10.5281/zenodo.6617192}
}
@inproceedings{TweetNERDNeurips2022,
 author = {Mishra, Shubhanshu and Saini, Aman and Makki, Raheleh and Mehta, Sneha and Haghighi, Aria and Mollahosseini, Ali},
 booktitle = {Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks},
 pages = {},
 title = {TweetNERD - End to End Entity Linking Benchmark for Tweets},
 volume = {2},
 year = {2022},
 eprint = {arXiv:2210.08129},
 doi = {10.48550/arXiv.2210.08129}
}

GAViD: Group Affect from ViDeos

zenodo.org

csv, zip

Updated Jun 5, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Deepak Kumar; Deepak Kumar; Puneet Kumar; Puneet Kumar; Xiaobai Li; Xiaobai Li; Balasubramanian Raman; Balasubramanian Raman (2025). GAViD: Group Affect from ViDeos [Dataset]. http://doi.org/10.5281/zenodo.15448846

Explore at:

csv, zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15448846

Dataset updated

Jun 5, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Deepak Kumar; Deepak Kumar; Puneet Kumar; Puneet Kumar; Xiaobai Li; Xiaobai Li; Balasubramanian Raman; Balasubramanian Raman

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Jun 1, 2025

Description

Overview

We introduce the Group Affect from ViDeos (GAViD) dataset, which comprises 5091 video clips with multimodal data (video, audio, and context), annotated with ternary valence and discrete emotion labels, and enriched with VideoGPT-generated contextual metadata and human-annotated action cues. We also present CAGNet, a baseline model for multimodal context aware group affect recognition. CAGNet achieves 61.20% test accuracy on GAViD, comparable to state-of-the art performance in the field.

NOTE: For now we are providing only Train video clips. The corresponding paper is under Review in ACM Multimedia 2025 Dataset Track. After its publication, the validation and Test set access will be granted upon request and approval, in accordance with the Responsible Use Policy.

Dataset Description

GAViD is a large-scale, in-the-wild multimodal dataset of 5091 samples, each annotated with the elements listed below. The following sections describe its key details and compilation procedure.

Raw video clips of an average duration of five seconds,
Audio aligned with the video clips,
Contextual metadata (scene descriptions, event labels) generated by a multimodal LLM and human-verified,
Group affect labels: ternary valence (positive, neutral, negative) and five discrete emotions (happy, sad, fear, anger, neutral),
Emotion intensity ratings (high, medium, low),
Interaction type labels (cooperative, hostile, neutral),
Action cues (e.g. smiling, clapping, shouting, dancing, singing).

Dataset details

Number of clips (samples) in GAViD-> 5130
Number of samples with some problem-> 39
Number of samples after filtering-> 5,091
Duration per clip-> 5 sec
Clip count per video-> 1–35
Dataset split-> Train: 3503; Val: 542; Test:1046
Affect labels (classwise distribution)-> Positive: 2600; Negative: 1189; Neutral: 1302
Emotion label distribution-> Neutral: 1522; Happy: 2428; Anger: 884; Sad: 201; Fear: 56

Keywords used to rearch the raw videos from YouTube

Positive	Positive	Negative	Negative	Neutral	Neutral
Team Celebration	Happy	Protest	Angry Sport	Group Meeting	Panel Discussion
Group Meeting	Video Conference	Heated Argument	Violent Protest	Parliament speech	People on street
Get Together	Meeting	Emotional breakdown in Public	Aggressive Argument	People walking on street	Team brainstorming Session
Celebration	Press Conference	Spritual Gathering	Aggressive Group	Team Building Activities	Group Discussion
Religious gathering	Talk Show	Street Race	Condolence	Group work session	Team Planning session
Farewell	Group Performance	Group Fight	Wrestling	Students in Discussion	Wedding Group Dance
People Dancing on Street	Street Comedy	MMA Fight	VIolence	Roundtable Discus- sion	Oath
Wedding Performance	Dhol masti	Boxing	Silent Protest	Mental health ad- dress	General Talk
Couple group dance	Comedy show	People in the fight	Group Fight	Wedding Celebration	Festival Celebration

Emotion Recognition Results using CAGNet

Model	Val Acc.	Val F1	Test Acc.	Test F1
CAGNet	62.55%	0.454	60.33%	0.448

Components of the Dataset

The dataset comprises two main components:

GAViD_train.csv file: Contains bin number used by labelbox in the annotation process, video_id, group_emotion (Positive, Negative, Neutral), specific_emotion (happy, sad, fear, anger, neutral), emotion_intensity, interaction_type, action_cuse, Video Description genertaed using Video-ChatGPT model.
GAViD_Train_VideoClips.zip folder: Contains the video clips of train set [For Now we are providing only Train video clips. Validation and Test set video clips will be provided as per the request].

Data Format and Fields of the CSV File

The dataset is structured in GAViD.csv file along with corresponding Videos in related folders. This CSV file includes the following fields:

Video_ID: Unique Identifier of a video
Group_Affect: Positive, Negative, Neutral
Descrete_Emotion: Happy, Sad, Fear, Anger, Neutral
Emotion_Intensity: High, Medium, Low
Interaction_Type: Cooperative, Hostile, Neutral
Action_Cues: e.g. Smiling, Clapping, Shouting, Dancing, Singing etc.
Context: Each video clip's summary generated from the Video-ChatGPT model.

Ethical considerations, data privacy and misuse prevention

Data Collection and Consent: The data collection and annotation strictly followed established ethical protocols in line with YouTube's Terms, which state “Public videos with a Creative Commons license may be reused". We downloaded only public-domain videos licensed under Creative Commons (CC BY 4.0), which “allows others to share, copy and redistribute the material in any medium or format, and to adapt, remix, transform, and build upon it for any purpose, even commercially".
Privacy: All content was reviewed to ensure no private or sensitive information is present. Faces are included only from public domain videos as needed for group affect research; only group-level content is released, with no attempt or risk of individual identification. Other personally identifiable information, such as
names and addresses and contacts, was removed.

Code and Citation

Code Repository: https: //github.com/deepakkumar-iitr/GAViD/tree/main
Citing the Dataset: Users of the dataset should cite the corresponding paper described at the above GitHub Repository.

License & Access

This dataset is released for academic research only and is free to researchers from educational or

Materials in Vessels Dataset, Annotated images of materials in transparent...
zenodo.org
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sagi Eppel; Sagi Eppel (2021). Materials in Vessels Dataset, Annotated images of materials in transparent vessels for semantic segmentation [Dataset]. http://doi.org/10.5281/zenodo.5769354
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5769354
Dataset updated
Dec 9, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sagi Eppel; Sagi Eppel
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Data set of materials in vessels
The handling of materials in glassware vessels is the main task in chemistry laboratory research as well as a large number of other activities. Visual recognition of the physical phase of the
materials is essential for many methods ranging from a simple task such as fill-level evaluation to the
identification of more complex properties such as solvation, precipitation, crystallization and phase
separation. To help train neural nets for this task, a new data set was created. The data set contains a
thousand images of materials, in different phases and involved in different chemical processes, in a
laboratory setting. Each pixel in each image is labeled according to several layers of classification, as
given below:

a. Vessel/Background: For each pixel assign value of one if it is part of the vessel and zero otherwise.
This annotation was used as the ROI map for the valve filter method.

b. Filled/Empty: This is similar to the above, but also distinguishes between the filled and empty
regions of the vessel. For each pixel, one of the following three values is assigned:0 (background); 1
(empty vessel); or 2 (filled vessel).

c. Phase type: This is similar to the above but distinguishes between liquid and solid regions of the
filled vessel. For each pixel, one of the following four values: 0 (background); 1 (empty vessel); 2
(liquid); or 3 (solid).

d. Fine-grained physical phase type: This is similar to the above but distinguishes between specific
classes of physical phase. For each pixel, one of 15 values is assigned: 1 (background); 2 (empty
vessel); 3 (liquid); 4 (liquid phase two, in the case where more than one phase of the liquid appears in
the vessel); 5 (suspension); 6 (emulsion); 7 (foam); 8 (solid); 9 (gel); 10 (powder); 11 (granular); 12
(bulk); 13 (solid-liquid mixture); 14 (solid phase two, in the case where more than one phase of solid
exists in the vessel): and 15 (vapor).
The annotations are given as images of the size of the original image, where the pixel value is the
class number. The annotation of the vessel region (a) is used in the ROI input for the valve filter net .

4.1. Validation/testing set
The data set is divided into training and testing sets. The testing set is itself divided into two subsets;
one contains images extracted from the same YouTube channels as the training set, and therefore was
taken under similar conditions as the training images. The second subset contains images extracted
from YouTube channels not included in the training set, and hence contains images taken under
different conditions from those used to train the net.

4.2. Creating the data set
The creation of a large number of images with a variety of chemical processes and settings could have
been a daunting task. Luckily, several YouTube channels dedicated to chemical experiments exist
which offer high-quality footage of chemistry experiments. Thanks to these channels, including
NurdRage, NileRed, ChemPlayer, it was possible to collect a large number of high-quality images in a
short time. Pixel-wise annotation of these images was another challenging task, and was performed by
Alexandra Emanuel and Mor Bismuth.

For more details see: Setting attention region for convolutional neural networks using region selective features, for recognition of materials within glass vessels

This dataset was first published in 2017.8

For newer and Bigger datasets see

https://zenodo.org/record/4736111#.YbG-RrtyZH4

https://zenodo.org/record/3697452#.YbG-TLtyZH4

Facebook

Twitter

Click to copy link

Link copied

Cite

Fabiano Belem (2016). Tag Recommendation Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.2067183.v4

Data from: Tag Recommendation Datasets

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.2067183.v4

Dataset updated

Jan 25, 2016

Dataset provided by

figshare

Authors

Fabiano Belem

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Associative Tag Recommendation Exploiting Multiple Textual FeaturesFabiano Belem, Eder Martins, Jussara M. Almeida Marcos Goncalves In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, July. 2011AbstractThis work addresses the task of recommending relevant tags to a target object by jointly exploiting three dimen- sions of the problem: (i) term co-occurrence with tags preassigned to the target object, (ii) terms extracted from mul- tiple textual features, and (iii) several metrics of tag relevance. In particular, we propose several new heuristic meth- ods, which extend previous, highly effective and efficient, state-of-the-art strategies by including new metrics that try to capture how accurately a candidate term describes the object’s content. We also exploit two learning to rank techniques, namely RankSVM and Genetic Programming, for the task of generating ranking functions that combine multiple metrics to accurately estimate the relevance of a tag to a given object. We evaluate all proposed methods in various scenarios for three popular Web 2.0 applications, namely, LastFM, YouTube and YahooVideo. We found that our new heuristics greatly outperform the methods on which they are based, producing gains in precision of up to 181%, as well as another state-of-the-art technique, with improvements in precision of up to 40% over the best baseline in any scenario. Some further improvements can also be achieved, in some scenarios, with the new learning-to-rank based strategies, which have the additional advantage of being quite flexible and easily extensible to exploit other aspects of the tag recommendation problem.Bibtex Citation@inproceedings{belem@sigir11, author = {Fabiano Bel\'em and Eder Martins and Jussara Almeida and Marcos Gon\c{c}alves}, title = {Associative Tag Recommendation Exploiting Multiple Textual Features}, booktitle = {{Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR'11)}}, month = {{July}}, year = {2011} }

Clear search

Close search

Google apps

Main menu

Data from: Tag Recommendation Datasets

MLB-YouTube Dataset Dataset

youtube-commons-asr-eval

YouTube NSI Captioning Dataset

Created by

Publication

Description

Files

Columns

Conditions of use

Feedback

Data from: EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition...

Kinetics-600 Dataset

Physical Exercise Recognition Dataset

Note:

Content

Collection Process

Processing Data

YouTube users in Africa 2020-2029

NII Face Mask Dataset

NII Face Mask Dataset v1.0

Pre-print version

Final version

1. Directory Structure

2. Description for each files in detail.

YouTube-ASMR-300K

AIDERv2 (Aerial Image Dataset for Emergency Response Applications)

EOAD (Egocentric Outdoor Activity Dataset)

Accident Detection Model Dataset

Accident-Detection-Model

Problem Statement

Accidents survey

Literature Survey

Research Gap

Proposed methodology

Model Set-up

Preparing Custom dataset

Challenges I ran into

I majorly ran into 3 problems while making this model

Data from: Dataset "Privacy-aware image classification and search"

Table_2_Investigating the Role of Culture on Negative Emotion Expressions in...

UrduSER: A Dataset for Urdu Speech Emotion Recognition

UnusualAction Dataset

Data from: TweetNERD - End to End Entity Linking Benchmark for Tweets

GAViD: Group Affect from ViDeos

Overview

Dataset Description

Keywords used to rearch the raw videos from YouTube

Emotion Recognition Results using CAGNet

Components of the Dataset

Data Format and Fields of the CSV File

Ethical considerations, data privacy and misuse prevention

Code and Citation

License & Access

Materials in Vessels Dataset, Annotated images of materials in transparent...

Data from: Tag Recommendation Datasets