71 datasets found
  1. Data from: Tag Recommendation Datasets

    • figshare.com
    txt
    Updated Jan 25, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabiano Belem (2016). Tag Recommendation Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.2067183.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 25, 2016
    Dataset provided by
    figshare
    Authors
    Fabiano Belem
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Associative Tag Recommendation Exploiting Multiple Textual FeaturesFabiano Belem, Eder Martins, Jussara M. Almeida Marcos Goncalves In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, July. 2011AbstractThis work addresses the task of recommending relevant tags to a target object by jointly exploiting three dimen- sions of the problem: (i) term co-occurrence with tags preassigned to the target object, (ii) terms extracted from mul- tiple textual features, and (iii) several metrics of tag relevance. In particular, we propose several new heuristic meth- ods, which extend previous, highly effective and efficient, state-of-the-art strategies by including new metrics that try to capture how accurately a candidate term describes the object’s content. We also exploit two learning to rank techniques, namely RankSVM and Genetic Programming, for the task of generating ranking functions that combine multiple metrics to accurately estimate the relevance of a tag to a given object. We evaluate all proposed methods in various scenarios for three popular Web 2.0 applications, namely, LastFM, YouTube and YahooVideo. We found that our new heuristics greatly outperform the methods on which they are based, producing gains in precision of up to 181%, as well as another state-of-the-art technique, with improvements in precision of up to 40% over the best baseline in any scenario. Some further improvements can also be achieved, in some scenarios, with the new learning-to-rank based strategies, which have the additional advantage of being quite flexible and easily extensible to exploit other aspects of the tag recommendation problem.Bibtex Citation@inproceedings{belem@sigir11, author = {Fabiano Bel\'em and Eder Martins and Jussara Almeida and Marcos Gon\c{c}alves}, title = {Associative Tag Recommendation Exploiting Multiple Textual Features}, booktitle = {{Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR'11)}}, month = {{July}}, year = {2011} }

  2. P

    MLB-YouTube Dataset Dataset

    • paperswithcode.com
    Updated Mar 23, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AJ Piergiovanni; Michael S. Ryoo (2021). MLB-YouTube Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/mlb-youtube-dataset
    Explore at:
    Dataset updated
    Mar 23, 2021
    Authors
    AJ Piergiovanni; Michael S. Ryoo
    Area covered
    YouTube
    Description

    The MLB-YouTube dataset is a new, large-scale dataset consisting of 20 baseball games from the 2017 MLB post-season available on YouTube with over 42 hours of video footage. The dataset consists of two components: segmented videos for activity recognition and continuous videos for activity classification. It is quite challenging as it is created from TV broadcast baseball games where multiple different activities share the camera angle. Further, the motion/appearance difference between the various activities is quite small.

  3. h

    youtube-commons-asr-eval

    • huggingface.co
    Updated Apr 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mobius Labs GmbH (2024). youtube-commons-asr-eval [Dataset]. https://huggingface.co/datasets/mobiuslabsgmbh/youtube-commons-asr-eval
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 29, 2024
    Dataset authored and provided by
    Mobius Labs GmbH
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    Dataset Card for youtube-commons-asr-eval

      Dataset Summary
    

    This evaluation dataset is created from a subset of Youtube-Commons [PleIAs/YouTube-Commons] by selecting English YouTube videos and corresponding english subtitle.

      Supported Tasks and Leaderboards
    

    This dataset will be primarily useful for automatic speech recognition evaluation tasks such as hf-audio/open_asr_leaderboard.

      Languages
    

    This subset is for English language evaluations.… See the full description on the dataset page: https://huggingface.co/datasets/mobiuslabsgmbh/youtube-commons-asr-eval.

  4. YouTube NSI Captioning Dataset

    • zenodo.org
    bin, csv
    Updated Mar 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lloyd May; Lloyd May; Keita Ohshiro; Keita Ohshiro; Khang Dang; Khang Dang; Sripathi Sridhar; Sripathi Sridhar; Jhanvi Pai; Jhanvi Pai; Magdalena Fuentes; Magdalena Fuentes; Sooyeon Lee; Sooyeon Lee; Mark Cartwright; Mark Cartwright (2024). YouTube NSI Captioning Dataset [Dataset]. http://doi.org/10.5281/zenodo.10681804
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Mar 1, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lloyd May; Lloyd May; Keita Ohshiro; Keita Ohshiro; Khang Dang; Khang Dang; Sripathi Sridhar; Sripathi Sridhar; Jhanvi Pai; Jhanvi Pai; Magdalena Fuentes; Magdalena Fuentes; Sooyeon Lee; Sooyeon Lee; Mark Cartwright; Mark Cartwright
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    Version 1.0, March 2024

    Created by

    Lloyd May (1), Keita Ohshiro (2,3), Khang Dang (2,3), Sripathi Sridhar (2,3), Jhanvi Pai (2,3), Magdalena Fuentes (4), Sooyeon Lee (3), Mark Cartwright (2,3,4)

    1. Center for Computer Research in Music and Acoustics, Stanford University
    2. Sound Interaction and Computing Lab, New Jersey Institute of Technology
    3. Department of Informatics, New Jersey Institute of Technology
    4. Music and Audio Research Lab, New York University

    Publication

    If using this data in an academic work, please reference the DOI and version, as well as cite the following paper, which presented the data collection procedure and the first version of the dataset:

    May, L., Ohshiro, K., Dang, K., Sridhar, S., Pai, J., Fuentes, M., Lee, S., Cartwright, M. Unspoken Sound: Identifying Trends in Non-Speech Audio Captioning on YouTube. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI), 2024.

    Description

    The YouTube NSI Captioning Dataset was developed to analyze the contemporary and historical state of non-speech information (NSI) captioning on YouTube. NSI includes information about non-speech sounds such as environmental sounds, sound effects, incidental sounds, and music, as well as additional narrative information and extra-speech information (ESI), which gives context to spoken or signed language such as manner of speech (e.g. "[Whispering] Oh no") or speaker label (e.g., "[Juan] Oh no"). The dataset contains measures of estimated and annotated NSI in the captions of two different samples of videos: a popular video sample and a studio video sample. The aim of the popular sample is to understand the captioning practices in a broad spectrum of popular, impactful videos on YouTube. In contrast, the aim of the studio sample is to examine captioning practices among the top-tier production houses, often viewed as industry benchmarks due to their influence and vast resources available for accessibility. Using the YouTube API, we queried for videos in these two samples for each month from 2013 to 2022. We then estimated which captions contain NSI by searching for non-alphanumeric symbols that are indicative of NSI, e.g., "[" and "]" (see Section 3.2 of the paper for a full list). In addition, the research team manually annotated which captions have NSI from a subset of approximately 1800 videos from years 2013, 2018, and 2022. Please see the Section 3.3 of the paper for details of the annotation process.

    The resulting YouTube NSI Captioning Dataset consists of NSI information from ~715k videos containing ~273M lines of captions, ~ 6M of which are estimated instances of NSI. These videos span 10 years and 21 topics. The annotated subset consists of 1799 videos with a total of ~36k annotated captions lines, ~114k of which are instances of NSI annotated on 7 different categories. These videos span 3 years (2013, 2018, and 2022) and 20 YouTube-assigned topics. Each video was annotated by two annotators along with the consensus annotation. The dataset contains the links to the YouTube videos, video metadata from the YouTube API, and measures of both estimated and annotated NSI. Due to copyright concerns, we are only publicly releasing data consisting of summary NSI measures for each video. If you need access to the raw data used to create these summary NSI measures, contact Mark Cartwright at mark.cartwright@njit.edu.

    Files

    • estimated_full_set_aggregate.csv : Data file containing the full set of video data with measures of estimated NSI.

    • annotated_subset_aggregate.csv : Data file containing the smaller annotated subset of video data with measures of both annotated and estimated NSI.

    Columns

    The following columns are present in both data files.

    • video_id : The YouTube video ID

    • year : The year associated with the time period from which the video was sampled.

    • sample : The sample which the video is from (i.e., popular or studio)

    • sampling_period_start_date : The start date of the time period from which the video was sampled.

    • sampling_period_end_date : The end date of the time period from which the video was sampled.

    • caption_type : This can take one of three values: auto which indicates a caption was provided by YouTube's automated caption system, manual which indicates a caption was provided by the uploader, or none which indicates that no captions are present for the video.

    • duration_minutes : The duration of the video in minutes.

    • channel_id : The ID that YouTube uses to uniquely identify the channel.

    • published_datetime : The date and time at which the video was published on YouTube.

    • youtube_topics : The YouTube-provided list of Wikipedia URLs that provide a description of the video's content.

    • category_id : The YouTube video category associated with the video.

    • view_count : The count of views on YouTube at the time of sampling (Spring 2023).

    • like_count : The count of likes on YouTube at the time of sampling (Spring 2023).

    • comment_count : The count of comments on YouTube at the time of sampling (Spring 2023).

    • high_level_topics : List of topics at a higher semantic level than youtube_topics that provide a description of the video's content. See paper for details on the mapping between youtube_topics and high_level_topics.

    • : The remainder of the columns take this form with the values listed below.

    Values for :

    • estimated_nsi : This NSI type is an estimation of NSI based on the presence of particular non-alphanumeric characters that are indicative of NSI as described in Section 3.2 of the paper.

    • general_nsi (only in annotated_subset_aggregate.csv) : The most general of NSI types that is inclusive of music_nsi, environmental_nsi, additionalnarrativ_nsi, and quotedspeech_nsi. All of these NSI types are included in the calculation of measures associated with general_nsi. Note that misc_nsi and nonenglish_captions are not included as those may or may not contain NSI, and thus, we opt for precision over recall. Not present for the unlabeled

    • music_nsi (only in annotated_subset_aggregate.csv) : Any genre of music, whether diegetic or not.

    • environmental_nsi (only in annotated_subset_aggregate.csv) : Environmental sounds, sound effects, and incidental sounds, i.e., non-music and non-speech sounds. This includes non-verbal vocalizations like laughter, grunts, and crying, provided they aren't used to modify speech.

    • extraspeech_nsi (only in annotated_subset_aggregate.csv) : Extra-speech Information (ESI), i.e., text that gives added context to spoken or signed language.

    • additionalnarrative_nsi (only in annotated_subset_aggregate.csv) : Additional narrative information in the form of descriptive text that doesn't pertain directly to sounds.

    • quotedspeech_nsi (only in annotated_subset_aggregate.csv) : Quoted Speech Captions containing internal quotation marks.

    • misc_nsi (only in annotated_subset_aggregate.csv) : Unsure, misc, or ambiguous, i.e., instances where the appropriate label is unclear or the caption doesn't fit current categories.

    • nonenglish_captions (only in annotated_subset_aggregate.csv) : Captions not written in English and thus have uncertain NSI status.

    Values for :

    • count : The number of captions identified as containing NSI of the specified type in the video.

    • presence : Indication of whether there is NSI of the specified type present in the video. 1 if present (e.g., count > 0), 0 if not present (e.g., count==0).

    • count_per_minute : A measure of the density of NSI captions. count_per_min = count / duration_minutes

    • count_per_minute_if_present : If presence==1, then count_per_minute, else, NaN. This is used for computing the aggregate CPMIP measure, which as discussed in the paper is intended to be a measure of the quality of NSI captions based on the assumption that more frequently captioned NSI within a video is an indicator of better NSI captioning. See Section 5 of the paper for details.

    Conditions of use

    Dataset created by Lloyd May, Keita Ohshiro, Khang Dang, Sripathi Sridhar, Jhanvi Pai, Magdalena Fuentes, Sooyeon Lee, and Mark Cartwright

    The YouTube NSI Captioning Dataset dataset is offered free of charge under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license: https://creativecommons.org/licenses/by/4.0/

    Feedback

    Please help us improve YouTube NSI Captioning Dataset by sending your feedback to:

    • Mark

  5. Z

    Data from: EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition...

    • data.niaid.nih.gov
    • opendatalab.com
    Updated Aug 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Doh, Seungheon (2021). EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5090630
    Explore at:
    Dataset updated
    Aug 26, 2021
    Dataset provided by
    Kim, Nabin
    Ching, Joann
    Yang, Yi-Hsuan
    Doh, Seungheon
    Hung, Hsiao-Tzu
    Nam, Juhan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    EMOPIA (pronounced ‘yee-mò-pi-uh’) dataset is a shared multi-modal (audio and MIDI) database focusing on perceived emotion in pop piano music, to facilitate research on various tasks related to music emotion. The dataset contains 1,087 music clips from 387 songs and clip-level emotion labels annotated by four dedicated annotators.

    For more detailed information about the dataset, please refer to our paper: EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation.

    File Description

    midis/: midi clips transcribed using GiantMIDI.

    Filename Q1_xxxxxxx_2.mp3: Q1 means this clip belongs to Q1 on the V-A space; xxxxxxx is the song ID on YouTube, and the 2 means this clip is the 2nd clip taken from the full song.

    metadata/: metadata from YouTube. (Got when crawling)

    songs_lists/: YouTube URLs of songs.

    tagging_lists/: raw tagging result for each sample.

    label.csv: metadata that records filename, 4Q label, and annotator.

    metadata_by_song.csv: list all the clips by the song. Can be used to create the train/val/test splits to avoid the same song appear in both train and test.

    scripts/prepare_split.ipynb: the script to create train/val/test splits and save them to csv files.

    2.2 Update

    Add tagging files in tagging_lists/ that are missing in the previous version.

    Add timestamps.json for easier usage. It records all the timestamps in dict format. You can see scripts/load_timestamp.ipynb for the format example.

    Add scripts/timestamp2clip.py: After the raw audio are crawled and put in audios/raw, you can use this script to get audio clips. The script will read timestamps.json and use the timestamp to extract clips. The clips will be saved to audios/seg folder.

    remove 7 midi files that were added by mistake, and also corrected the number in metadata_by_song.csv.

    2.1 Update

    Add one file and one folder:

    key_mode_tempo.csv: key, mode, and tempo information extracted from files.

    CP_events/: CP events used in our paper. Extracted using this script, and add the emotion event to the front.

    Modify one folder:

    The REMI_events/ files in version 2.0 contain some information that is not related to the paper, so remove it.

    2.0 Update

    Add two new folders:

    corpus/: processed data that following the preprocessing flow. (Please notice that although we have 1078 clips in our dataset, we lost some clips during steps 1~4 of the flow, so the final number of clips in this corpus is 1052, and that's the number we used for training the generative model.)

    REMI_events/: REMI event for each midi file. They are generated using this script.

    Cite this dataset

    @inproceedings{{EMOPIA}, author = {Hung, Hsiao-Tzu and Ching, Joann and Doh, Seungheon and Kim, Nabin and Nam, Juhan and Yang, Yi-Hsuan}, title = {{MOPIA}: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation}, booktitle = {Proc. Int. Society for Music Information Retrieval Conf.}, year = {2021} }

  6. P

    Kinetics-600 Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Apr 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joao Carreira; Eric Noland; Andras Banki-Horvath; Chloe Hillier; Andrew Zisserman (2021). Kinetics-600 Dataset [Dataset]. https://paperswithcode.com/dataset/kinetics-600
    Explore at:
    Dataset updated
    Apr 21, 2021
    Authors
    Joao Carreira; Eric Noland; Andras Banki-Horvath; Chloe Hillier; Andrew Zisserman
    Description

    The Kinetics-600 is a large-scale action recognition dataset which consists of around 480K videos from 600 action categories. The 480K videos are divided into 390K, 30K, 60K for training, validation and test sets, respectively. Each video in the dataset is a 10-second clip of action moment annotated from raw YouTube video. It is an extensions of the Kinetics-400 dataset.

  7. Physical Exercise Recognition Dataset

    • kaggle.com
    Updated Feb 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhannad Tuameh (2023). Physical Exercise Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/muhannadtuameh/exercise-recognition
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Muhannad Tuameh
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Note:

    Because this dataset has been used in a competition, we had to hide some of the data to prepare the test dataset for the competition. Thus, in the previous version of the dataset, only train.csv file is existed.

    Content

    This dataset represents 10 different physical poses that can be used to distinguish 5 exercises. The exercises are Push-up, Pull-up, Sit-up, Jumping Jack and Squat. For every exercise, 2 different classes have been used to represent the terminal positions of that exercise (e.g., “up” and “down” positions for push-ups).

    Collection Process

    About 500 videos of people doing the exercises have been used in order to collect this data. The videos are from Countix Dataset that contain the YouTube links of several human activity videos. Using a simple Python script, the videos of 5 different physical exercises are downloaded. From every video, at least 2 frames are manually extracted. The extracted frames represent the terminal positions of the exercise.

    Processing Data

    For every frame, MediaPipe framework is used for applying pose estimation, which detects the human skeleton of the person in the frame. The landmark model in MediaPipe Pose predicts the location of 33 pose landmarks (see figure below). Visit Mediapipe Pose Classification page for more details.

    https://mediapipe.dev/images/mobile/pose_tracking_full_body_landmarks.png" alt="33 pose landmarks">

  8. YouTube users in Africa 2020-2029

    • statista.com
    Updated Feb 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). YouTube users in Africa 2020-2029 [Dataset]. https://www.statista.com/topics/9813/internet-usage-in-africa/
    Explore at:
    Dataset updated
    Feb 15, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    Africa
    Description

    The number of Youtube users in Africa was forecast to continuously increase between 2024 and 2029 by in total 0.03 million users (+3.95 percent). The Youtube user base is estimated to amount to 0.79 million users in 2029. User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Worldwide and the Americas.

  9. Z

    NII Face Mask Dataset

    • data.niaid.nih.gov
    Updated Jan 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junichi Yamagishi (2022). NII Face Mask Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5761724
    Explore at:
    Dataset updated
    Jan 26, 2022
    Dataset provided by
    Khanh-Duy Nguyen
    Junichi Yamagishi
    Isao Echizen
    Trung-Nghia Le
    Huy H. Nguyen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    =====================================================================

    NII Face Mask Dataset v1.0

    =====================================================================

    Authors: Trung-Nghia Le (1), Khanh-Duy Nguyen (2), Huy H. Nguyen (1), Junichi Yamagishi (1), Isao Echizen (1)

    Affiliations: (1)National Institute of Informatics, Japan (2)University of Information Technology-VNUHCM, Vietnam

    National Institute of Informatics Copyright (c) 2021

    Emails: {ltnghia, nhhuy, jyamagis, iechizen}@nii.ac.jp, {khanhd}@uit.edu.vn

    Arxiv: https://arxiv.org/abs/2111.12888 NII Face Mask Dataset v1.0: https://zenodo.org/record/5761725

    =============================== INTRODUCTION ===============================

    The NII Face Mask Dataset is the first large-scale dataset targeting mask-wearing ratio estimation in street cameras. This dataset contains 581,108 face annotations extracted from 18,088 video frames (1920x1080 pixels) in 17 street-view videos obtained from the Rambalac's YouTube channel.

    The videos were taken in multiple places, at various times, before and during the COVID-19 pandemic. The total length of the videos is approximately 56 hours.

    =============================== REFERENCES ===============================

    If your publish using any of the data in this dataset please cite the following papers:

    Pre-print version

    @article{Nguyen202112888, title={Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio}, author={Nguyen, Khanh-Duy and Nguyen, Huy H and Le, Trung-Nghia and Yamagishi, Junichi and Echizen, Isao}, archivePrefix={arXiv}, arxivId={2111.12888}, url={https://arxiv.org/abs/2111.12888}, year={2021} }

    Final version

    @INPROCEEDINGS{Nguyen2021EstMaskWearing, author={Nguyen, Khanh-Duv and Nguyen, Huv H. and Le, Trung-Nghia and Yamagishi, Junichi and Echizen, Isao}, booktitle={2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)}, title={Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio}, year={2021}, pages={1-8}, url={https://ieeexplore.ieee.org/document/9667046}, doi={10.1109/FG52635.2021.9667046}}

    ======================== DATA STRUCTURE ==================================

    1. Directory Structure

    ./NFM ├── dataset │ ├── train.csv: annotations for the train set. │ ├── test.csv: annotations for the test set. └── README_v1.0.md

    2. Description for each files in detail.

    We use the same structure for two CSV files (train.csv and test.csv). Both CSV files have the same columns: <1st column>: video_id (a source video can be found by following the link: https://www.youtube.com/watch?v=) <2nd column>: frame_id (the index of a frame extracted from the source video) <3rd column>: timestamp in milisecond (the timestamp of a frame extracted from the source video) <4th column>: label (for each annotated face, one of three labels was attached with a bounding box: 'Mask'/'No-Mask'/'Unknown') <5th column>: left <6th column>: top <7th column>: right <8th column>: bottom Four coordinates (left, top, right, bottom) were used to denote a face's bounding box.

    ============================== COPYING ================================

    This repository is made available under Creative Commons Attribution License (CC-BY).

    Regarding Creative Commons License: Attribution 4.0 International (CC BY 4.0), please see https://creativecommons.org/licenses/by/4.0/

    THIS DATABASE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DATABASE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE

    ====================== ACKNOWLEDGEMENTS ================================

    This research was partly supported by JSPS KAKENHI Grants (JP16H06302, JP18H04120, JP21H04907, JP20K23355, JP21K18023), and JST CREST Grants (JPMJCR20D3, JPMJCR18A6), Japan.

    This dataset is based on the Rambalac's YouTube channel: https://www.youtube.com/c/Rambalac

  10. YouTube-ASMR-300K

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jun 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karren Yang; Bryan Russell; Justin Salamon; Karren Yang; Bryan Russell; Justin Salamon (2020). YouTube-ASMR-300K [Dataset]. http://doi.org/10.5281/zenodo.3889168
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 12, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Karren Yang; Bryan Russell; Justin Salamon; Karren Yang; Bryan Russell; Justin Salamon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    The YouTube-ASMR dataset contains URLS for over 900 hours of ASMR video clips with stereo/binaural audio produced by various YouTube artists. The following paper contains a detailed description of the dataset and how it was compiled:

    K. Yang, B. Russell and J. Salamon, "Telling Left from Right: Learning Spatial Correspondence of Sight and Sound", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Conference, June 2020.

  11. AIDERv2 (Aerial Image Dataset for Emergency Response Applications)

    • zenodo.org
    zip
    Updated Sep 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Demetris Shianios; Demetris Shianios; Christos Kyrkou; Christos Kyrkou (2024). AIDERv2 (Aerial Image Dataset for Emergency Response Applications) [Dataset]. http://doi.org/10.5281/zenodo.10891054
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 3, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Demetris Shianios; Demetris Shianios; Christos Kyrkou; Christos Kyrkou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    SUMMARY OF DATASET
    • This dataset consist of 167723 aerial images divided into 4 classes.
    • The dataset contains three commonly occurring natural disasters
    earthquake/collapsed buildings, flood, wildfire/fire, and a normal class; do not reflect any disaster
    • The images can be loaded as numpy arrays using Python programming language and then used to train a Convolutional Neural Network to detect natural disasters from aerial images.
    • The images are resized to 224x224x3 (heighty,width,channel number) when loaded as numpy arrays.
    • The dataset is an extension of the AIDER dataset (Aerial Image Dataset for Emergency Response Applications).
    • Additional images were collected by open source databases and extracted images as frames of videos downloaded from YouTube.
    The table below shows the number of images in each set.
    Train Validation Test Total
    Earthquakes 1927 239 239 2405
    Floods 4063 505 502 5070
    Fire 3509 439 436 4384
    Normal 3900 487 477 4864
    Total 13399 1670 1654 16723
    If you use this dataset please cite the following publications:
    [1] Shianios, D., Kyrkou, C., Kolios, P.S. (2023). A Benchmark and Investigation of Deep-Learning-Based Techniques for Detecting Natural Disasters in Aerial Images. In: Tsapatsoulis, N., et al. Computer Analysis of Images and Patterns. CAIP 2023. Lecture Notes in Computer Science, vol 14185. Springer, Cham. https://doi.org/10.1007/978-3-031-44240-7_24
    [2] D. Shianios, P. Kolios, C. Kyrkou, "DiRecNetV2: A Transformer-Enhanced Network for Aerial Disaster Recognition", SN Computer Science, 2024 (Accepted to Appear)
    DATASET FOLDERS FORMAT
    └───data
    │ │
    │ └───Dataset_Images
    │ │ └───Train
    │ │ │ | └───Earthquake
    │ │ │ | img (1).jpg
    │ │ │ | img (2).jpg
    │ │ │ | .....
    │ │ │ | └───Flood
    │ │ │ | img (1).jpg
    │ │ │ | img (2).jpg
    │ │ │ | .....
    │ │ │ | └───Normal
    │ │ │ | img (1).jpg
    │ │ │ | img (2).jpg
    │ │ │ | .....
    │ │ │ | └───Wildfire
    │ │ │ | img (1).jpg
    │ │ │ | img (2).jpg
    │ │ │ | .....
    │ │ └───Val
    │ │ │ | └───Earthquake
    │ │ │ | img (1).jpg
    │ │ │ | img (2).jpg
    │ │ │ | .....
    │ │ │ | └───Flood
    │ │ │ | img (1).jpg
    │ │ │ | img (2).jpg
    │ │ │ | .....
    │ │ │ | └───Normal
    │ │ │ | img (1).jpg
    │ │ │ | img (2).jpg
    │ │ │ | .....
    │ │ │ | └───Wildfire
    │ │ │ | img (1).jpg
    │ │ │ | img (2).jpg
    │ │ │ | .....
    │ │ └───Test
    │ │ │ | └───Earthquake
    │ │ │ | img (1).jpg
    │ │ │ | img (2).jpg
    │ │ │ | .....
    │ │ │ | └───Flood
    │ │ │ | img (1).jpg
    │ │ │ | img (2).jpg
    │ │ │ | .....
    │ │ │ | └───Normal
    │ │ │ | img (1).jpg
    │ │ │ | img (2).jpg
    │ │ │ | .....
    │ │ │ | └───Wildfire
    │ │ │ | img (1).jpg
    │ │ │ | img (2).jpg
    │ │ │ | .....
    DATA SOURCES AND DATA COLLECTION
    OPEN SOURCE DATABASES
    └───AIDER
    Kyrkou, C. and Theocharides, T., 2020. EmergencyNet: Efficient aerial image classification for drone-based emergency monitoring using atrous convolutional feature fusion. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, pp.1687-1699.
    └───ERA
    Mou, L., Hua, Y., Jin, P. and Zhu, X.X., 2020. Era: A data set and deep learning benchmark for event recognition in aerial videos [software and data sets]. IEEE Geoscience and Remote Sensing Magazine, 8(4), pp.125-133.
    @article{eradataset,
    title = {{ERA: A dataset and deep learning benchmark for event recognition in aerial videos}},
    author = {Mou, L. and Hua, Y. and Jin, P. and Zhu, X. X.},
    journal = {IEEE Geoscience and Remote Sensing Magazine},
    year = {in press}
    }
    └───ISBDA
    Zhu, X., Liang, J. and Hauptmann, A., 2021. Msnet: A multilevel instance segmentation network for natural disaster damage assessment in aerial videos. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2023-2032).
    @misc{zhu2020msnet,
    title={MSNet: A Multilevel Instance Segmentation Network for Natural Disaster Damage Assessment in Aerial Videos},
    author={Xiaoyu Zhu and Junwei Liang and Alexander Hauptmann},
    year={2020},
    eprint={2006.16479},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
    }
    └───Floods 2013
    Barz, B., Schröter, K., Münch, M., Yang, B., Unger, A., Dransch, D. and Denzler, J., 2019. Enhancing flood impact analysis using interactive retrieval of social media images. arXiv preprint arXiv:1908.03361.
    @article{barz2019enhancing,
    title={Enhancing flood impact analysis using interactive retrieval of social media images},
    author={Barz, Bj{\"o}rn and Schr{\"o}ter, Kai and M{\"u}nch, Moritz and Yang, Bin and Unger, Andrea and Dransch, Doris and Denzler, Joachim},
    journal={arXiv preprint arXiv:1908.03361},
    year={2019}
    }
    └───Wildfire Research
    └───PyImages
    The dataset was curated by PyImageSearch reader, Gautam Kumar.
    YOUTUBE VIDEOS
    └───Collapsed Buildings/Earthquakes

  12. EOAD (Egocentric Outdoor Activity Dataset)

    • zenodo.org
    • data.niaid.nih.gov
    csv, png
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mehmet Ali Arabacı; Mehmet Ali Arabacı; Elif Surer; Elif Surer; Alptekin Temizel; Alptekin Temizel (2024). EOAD (Egocentric Outdoor Activity Dataset) [Dataset]. http://doi.org/10.5281/zenodo.7742660
    Explore at:
    csv, pngAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mehmet Ali Arabacı; Mehmet Ali Arabacı; Elif Surer; Elif Surer; Alptekin Temizel; Alptekin Temizel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    EOAD is a collection of videos captured by wearable cameras, mostly of sports activities. It contains both visual and audio modalities.

    It was initiated by the HUJI and FPVSum egocentric activity datasets. However, the number of samples and diversity of activities for HUJI and FPVSum were insufficient. Therefore, we combined these datasets and populated them with new YouTube videos.

    The selection of videos was based on the following criteria:

    • The videos should not include text overlays.
    • The videos should contain natural sound (no external music)
    • The actions in videos should be continuous (no cutting the scene or jumping in time)

    Video samples were trimmed depending on scene changes for long videos (such as driving, scuba diving, and cycling). As a result, a video may have several clips depicting egocentric actions. Hence, video clips were extracted from carefully defined time intervals within videos. The final dataset includes video clips with a single action and natural audio information.

    Statistics for EOAD:

    • 30 activities
    • 303 distinct videos
    • 1392 video clips
    • 2243 minutes labeled videos clips

    The detailed statistics for the selected datasets and the crawled videos clips from YouTube are given below:

    • HUJI: 49 distinct videos - 148 video clips for 9 activities (driving, biking, motorcycle, walking, boxing, horse riding, running, skiing, stair climbing)
    • FPVSum: 39 distinct videos - 124 video segments for 8 activities (biking, horse riding, skiing, longboarding, rock climbing, scuba, skateboarding, surfing)
    • YouTube: 216 distinct videos - 1120 video clips for 27 activities (american football, basketball, bungee jumping, driving, go-kart, horse riding, ice hockey, jet ski, kayaking, kitesurfing, longboarding, motorcycle, paintball, paragliding, rafting, rock climbing, rowing, running, sailing, scuba diving, skateboarding, soccer, stair climbing, surfing, tennis, volleyball, walking)

    The video clips used for training, validation and test sets for each activity are listed in Table 1. Multiple video clips may belong to a single video because of trimming it for some reasons (i.e., scene cut, temporary overlayed text on videos, or video parts unrelated to activities).

    While splitting the dataset, the minimum number of videos for each activity was selected as 8. Additionally, the video samples were divided as 50%, 25%, and 25% for training (minimum four videos), validation (minimum two videos), and testing (minimum two videos), respectively. On the other hand, videos were split according to the raw video footage to prevent the mixing of similar video clips (having the same actors and scenes) into training, validation, and test sets. Therefore, we ensured that the video clips trimmed from the same videos were split together into training, validation, or test sets to satisfy a fair comparison.

    Some activities have continuity throughout the video, such as scuba, longboarding, or riding horse, which also have an equal number of video segments with the number of videos. However, some activities, such as skating, occurred in a short time, making the number of video segments higher than the others. As a result, the number of video clips for training, validation, and test sets was highly imbalanced for the selected activities (i.e., jet ski and rafting have 4; however, soccer has 99 video clips for training).

    Table 1 - Dataset splitting for EOAD

    Train

    Validation

    Test

    Action Label

    #Clips

    Total Duration

    #Clips

    Total Duration

    #Clips

    Total Duration

    AmericanFootball

    34

    00:06:09

    36

    00:05:03

    9

    00:01:20

    Basketball

    43

    01:13:22

    19

    00:08:13

    10

    00:28:46

    Biking

    9

    01:58:01

    6

    00:32:22

    11

    00:36:16

    Boxing

    7

    00:24:54

    11

    00:14:14

    5

    00:17:30

    BungeeJumping

    7

    00:02:22

    4

    00:01:36

    4

    00:01:31

    Driving

    19

    00:37:23

    9

    00:24:46

    9

    00:29:23

    GoKart

    5

    00:40:00

    3

    00:11:46

    3

    00:19:46

    Horseback

    5

    01:15:14

    5

    01:02:26

    2

    00:20:38

    IceHockey

    52

    00:19:22

    46

    00:20:34

    10

    00:36:59

    Jetski

    4

    00:23:35

    5

    00:18:42

    6

    00:02:43

    Kayaking

    28

    00:43:11

    22

    00:14:23

    4

    00:11:05

    Kitesurfing

    30

    00:21:51

    17

    00:05:38

    6

    00:01:32

    Longboarding

    5

    00:15:40

    4

    00:18:03

    4

    00:09:11

    Motorcycle

    20

    00:49:38

    21

    00:13:53

    8

    00:20:30

    Paintball

    7

    00:33:52

    4

    00:12:08

    4

    00:08:52

    Paragliding

    11

    00:28:42

    4

    00:10:16

    4

    00:19:50

  13. R

    Accident Detection Model Dataset

    • universe.roboflow.com
    zip
    Updated Apr 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Accident detection model (2024). Accident Detection Model Dataset [Dataset]. https://universe.roboflow.com/accident-detection-model/accident-detection-model/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 8, 2024
    Dataset authored and provided by
    Accident detection model
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Accident Bounding Boxes
    Description

    Accident-Detection-Model

    Accident Detection Model is made using YOLOv8, Google Collab, Python, Roboflow, Deep Learning, OpenCV, Machine Learning, Artificial Intelligence. It can detect an accident on any accident by live camera, image or video provided. This model is trained on a dataset of 3200+ images, These images were annotated on roboflow.

    Problem Statement

    • Road accidents are a major problem in India, with thousands of people losing their lives and many more suffering serious injuries every year.
    • According to the Ministry of Road Transport and Highways, India witnessed around 4.5 lakh road accidents in 2019, which resulted in the deaths of more than 1.5 lakh people.
    • The age range that is most severely hit by road accidents is 18 to 45 years old, which accounts for almost 67 percent of all accidental deaths.

    Accidents survey

    https://user-images.githubusercontent.com/78155393/233774342-287492bb-26c1-4acf-bc2c-9462e97a03ca.png" alt="Survey">

    Literature Survey

    • Sreyan Ghosh in Mar-2019, The goal is to develop a system using deep learning convolutional neural network that has been trained to identify video frames as accident or non-accident.
    • Deeksha Gour Sep-2019, uses computer vision technology, neural networks, deep learning, and various approaches and algorithms to detect objects.

    Research Gap

    • Lack of real-world data - We trained model for more then 3200 images.
    • Large interpretability time and space needed - Using google collab to reduce interpretability time and space required.
    • Outdated Versions of previous works - We aer using Latest version of Yolo v8.

    Proposed methodology

    • We are using Yolov8 to train our custom dataset which has been 3200+ images, collected from different platforms.
    • This model after training with 25 iterations and is ready to detect an accident with a significant probability.

    Model Set-up

    Preparing Custom dataset

    • We have collected 1200+ images from different sources like YouTube, Google images, Kaggle.com etc.
    • Then we annotated all of them individually on a tool called roboflow.
    • During Annotation we marked the images with no accident as NULL and we drew a box on the site of accident on the images having an accident
    • Then we divided the data set into train, val, test in the ratio of 8:1:1
    • At the final step we downloaded the dataset in yolov8 format.
      #### Using Google Collab
    • We are using google colaboratory to code this model because google collab uses gpu which is faster than local environments.
    • You can use Jupyter notebooks, which let you blend code, text, and visualisations in a single document, to write and run Python code using Google Colab.
    • Users can run individual code cells in Jupyter Notebooks and quickly view the results, which is helpful for experimenting and debugging. Additionally, they enable the development of visualisations that make use of well-known frameworks like Matplotlib, Seaborn, and Plotly.
    • In Google collab, First of all we Changed runtime from TPU to GPU.
    • We cross checked it by running command ‘!nvidia-smi’
      #### Coding
    • First of all, We installed Yolov8 by the command ‘!pip install ultralytics==8.0.20’
    • Further we checked about Yolov8 by the command ‘from ultralytics import YOLO from IPython.display import display, Image’
    • Then we connected and mounted our google drive account by the code ‘from google.colab import drive drive.mount('/content/drive')’
    • Then we ran our main command to run the training process ‘%cd /content/drive/MyDrive/Accident Detection model !yolo task=detect mode=train model=yolov8s.pt data= data.yaml epochs=1 imgsz=640 plots=True’
    • After the training we ran command to test and validate our model ‘!yolo task=detect mode=val model=runs/detect/train/weights/best.pt data=data.yaml’ ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt conf=0.25 source=data/test/images’
    • Further to get result from any video or image we ran this command ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt source="/content/drive/MyDrive/Accident-Detection-model/data/testing1.jpg/mp4"’
    • The results are stored in the runs/detect/predict folder.
      Hence our model is trained, validated and tested to be able to detect accidents on any video or image.

    Challenges I ran into

    I majorly ran into 3 problems while making this model

    • I got difficulty while saving the results in a folder, as yolov8 is latest version so it is still underdevelopment. so i then read some blogs, referred to stackoverflow then i got to know that we need to writ an extra command in new v8 that ''save=true'' This made me save my results in a folder.
    • I was facing problem on cvat website because i was not sure what
  14. Z

    Data from: Dataset "Privacy-aware image classification and search"

    • data.niaid.nih.gov
    • eprints.soton.ac.uk
    Updated Oct 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siersdorfer, Stefan (2021). Dataset "Privacy-aware image classification and search" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4568970
    Explore at:
    Dataset updated
    Oct 15, 2021
    Dataset provided by
    Siersdorfer, Stefan
    Zerr, Sergej
    Hare Jonathon
    Demidova, Elena
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Modern content sharing environments such as Flickr or YouTube contain a large number of private resources such as photos showing weddings, family holidays, and private parties. These resources can be of a highly sensitive nature, disclosing many details of the users' private sphere. In order to support users in making privacy decisions in the context of image sharing and to provide them with a better overview of privacy-related visual content available on the Web, we propose techniques to automatically detect private images and to enable privacy-oriented image search. In order to classify images, we use the metadata like title and tags and plan to use visual features which are described in our scientific paper. The data set used in the paper is now available.

    Picalet! cleaned dataset - ( recommended for experiments) userstudy - (images annotated with queries, anonymized user id and privacy value)

  15. f

    Table_2_Investigating the Role of Culture on Negative Emotion Expressions in...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emma Hughson; Roya Javadi; James Thompson; Angelica Lim (2023). Table_2_Investigating the Role of Culture on Negative Emotion Expressions in the Wild.XLSX [Dataset]. http://doi.org/10.3389/fnint.2021.699667.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    Frontiers
    Authors
    Emma Hughson; Roya Javadi; James Thompson; Angelica Lim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Even though culture has been found to play some role in negative emotion expression, affective computing research primarily takes on a basic emotion approach when analyzing social signals for automatic emotion recognition technologies. Furthermore, automatic negative emotion recognition systems still train data that originates primarily from North America and contains a majority of Caucasian training samples. As such, the current study aims to address this problem by analyzing what the differences are of the underlying social signals by leveraging machine learning models to classify 3 negative emotions, contempt, anger and disgust (CAD) amongst 3 different cultures: North American, Persian, and Filipino. Using a curated data set compiled from YouTube videos, a support vector machine (SVM) was used to predict negative emotions amongst differing cultures. In addition a one-way ANOVA was used to analyse the differences that exist between each culture group in-terms of level of activation of underlying social signal. Our results not only highlighted the significant differences in the associated social signals that were activated for each culture, but also indicated the specific underlying social signals that differ in our cross-cultural data sets. Furthermore, the automatic classification methods showed North American expressions of CAD to be well-recognized, while Filipino and Persian expressions were recognized at near chance levels.

  16. m

    UrduSER: A Dataset for Urdu Speech Emotion Recognition

    • data.mendeley.com
    Updated Apr 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Zaheer Akhtar (2025). UrduSER: A Dataset for Urdu Speech Emotion Recognition [Dataset]. http://doi.org/10.17632/jcpfjnk5c2.4
    Explore at:
    Dataset updated
    Apr 28, 2025
    Authors
    Muhammad Zaheer Akhtar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Speech Emotion Recognition (SER) is a rapidly evolving field of research aimed at identifying and categorizing emotional states through the analysis of speech signals. As SER holds significant socio-cultural and commercial importance, researchers are increasingly leveraging machine learning and deep learning techniques to drive advancements in this domain. A high-quality dataset is an essential resource for SER studies in any language. Despite Urdu being the 10th most spoken language globally, there is a significant lack of robust SER datasets, creating a research gap. Existing Urdu SER datasets are often limited by their small size, narrow emotional range, and repetitive content, reducing their applicability in real-world scenarios. To address this gap, the Urdu Speech Emotion Recognition (UrduSER) was developed. This comprehensive dataset includes 3500 Urdu speech signals sourced from 10 professional actors, with an equal representation of male and female speakers from diverse age groups. The dataset encompasses seven emotional states: Angry, Fear, Boredom, Disgust, Happy, Neutral, and Sad. The speech samples were curated from a wide collection of Pakistani Urdu drama serials and telefilms available on YouTube, ensuring diversity and natural delivery. Unlike conventional datasets, which rely on predefined dialogs recorded in controlled environments, UrduSER features unique and contextually varied utterances, making it more realistic and applicable for practical applications. To ensure balance and consistency, the dataset contains 500 samples per emotional class, with 50 samples contributed by each actor for each emotion. Additionally, an accompanying Excel file provides detailed metadata for each recording, including the file name, duration, format, sample rate, actor details, emotional state, and corresponding Urdu dialog. This metadata enables researchers to efficiently organize and utilize the dataset for their specific needs. The UrduSER dataset underwent rigorous validation, integrating expert evaluation and model-based validation to ensure its reliability, accuracy, and overall suitability for advancing research and development in Urdu Speech Emotion Recognition.

  17. f

    UnusualAction Dataset

    • figshare.com
    bin
    Updated May 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nitika Nigam; Tanima Dutta; Hari Prabhat Gupta (2022). UnusualAction Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.19782529.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    May 18, 2022
    Dataset provided by
    figshare
    Authors
    Nitika Nigam; Tanima Dutta; Hari Prabhat Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    UnusualAction Dataset for Action Recognition Nitika Nigam, Tanima Dutta and Hari Prabhat Gupta Indian Institute of Technology (BHU), India. Overview: UnusualAction is an uncertain action recognition dataset that rarely happens and collected from YouTube. The dataset comprises 14 unusual action categories, and each category contains 50-100 videos. UnusualAction gives the diversity in terms of different falling actions and with the presence of noises, such as, variations in camera motions, person appearance, viewpoint, cluttered background, illumination conditions, etc. It is a challenging dataset for uncertain action recognition. Most action recognition datasets are based on certain actions; on the contrary, UnusualAction aims to encourage further research into uncertain action recognition by learning and exploring new realistic action categories. Structure for UnusualAction Dataset ● Data associated with each UnusualAction category is stored in separate directories. ● Each directory comprises *.mp4 or *.avi files for videos. ● The directory is arranged in the following structure: FallAction_datasets ├──Blending_phone ├── Crushing_laptop ├── Cutting_keyboard ├── Drilling_Laptop ├── Drilling_Phone ├── Frying_Phone ├── Hammering_Laptop ├── Hammering_phone ├── Hammering_pumpkin ├── Hammering_watermelon ├── Microwave_shoes ├──Microwave_phone ├── Washing_laptop ├── Washing_Paptop

  18. Data from: TweetNERD - End to End Entity Linking Benchmark for Tweets

    • zenodo.org
    bin, tsv
    Updated Feb 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shubhanshu Mishra; Shubhanshu Mishra; Aman Saini; Raheleh Makki; Sneha Mehta; Aria Haghighi; Ali Mollahosseini; Aman Saini; Raheleh Makki; Sneha Mehta; Aria Haghighi; Ali Mollahosseini (2023). TweetNERD - End to End Entity Linking Benchmark for Tweets [Dataset]. http://doi.org/10.5281/zenodo.6617192
    Explore at:
    tsv, binAvailable download formats
    Dataset updated
    Feb 3, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shubhanshu Mishra; Shubhanshu Mishra; Aman Saini; Raheleh Makki; Sneha Mehta; Aria Haghighi; Ali Mollahosseini; Aman Saini; Raheleh Makki; Sneha Mehta; Aria Haghighi; Ali Mollahosseini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TweetNERD - End to End Entity Linking Benchmark for Tweets

    Paper - Video - Neurips Page

    This is the dataset described in the paper TweetNERD - End to End Entity Linking Benchmark for Tweets (accepted to Thirty-sixth Conference on Neural Information Processing Systems (Neurips) Datasets and Benchmarks Track).

    Named Entity Recognition and Disambiguation (NERD) systems are foundational for information retrieval, question answering, event detection, and other natural language processing (NLP) applications. We introduce TweetNERD, a dataset of 340K+ Tweets across 2010-2021, for benchmarking NERD systems on Tweets. This is the largest and most temporally diverse open sourced dataset benchmark for NERD on Tweets and can be used to facilitate research in this area.

    TweetNERD dataset is released under Creative Commons Attribution 4.0 International (CC BY 4.0) LICENSE.

    The license only applies to the data files present in this dataset. See Data usage policy below.

    Check out more details at https://github.com/twitter-research/TweetNERD

    Usage

    We provide the dataset split across the following tab seperated files:

    • OOD.public.tsv: OOD split of the data in the paper.
    • Academic.public.tsv: Academic split of the data described in the paper.
    • part_*.public.tsv: Remaining data split into parts in no particular order.

    Each file is tab separated and has has the following format:

    tweet_idphrasestartendentityIdscore
    22twttr2025Q9183
    21twttr2025Q9183
    1457198399032287235Diwali3038Q102443
    1232456079247736833NO_PHRASE-1-1NO_ENTITY-1

    For tweets which don't have any entity, their column values for phrase, start, end, entityId, score are set NO_PHRASE, -1, -1, NO_ENTITY, -1 respectively.

    Description of file columns is as follows:

    ColumnTypeMissing ValueDescription
    tweet_idstring ID of the Tweet
    phrasestringNO_PHRASEentity phrase
    startint-1start offset of the phrase in text using UTF-16BE encoding
    endint-1end offset of the phrase in the text using UTF-16BE encoding
    entityIdstringNO_ENTITYEntity ID. If not missing can be NOT FOUND, AMBIGUOUS, or Wikidata ID of format Q{numbers}, e.g. Q918
    scoreint-1Number of annotators who agreed on the phrase, start, end, entityId information

    In order to use the dataset you need to utilize the tweet_id column and get the Tweet text using the Twitter API (See Data usage policy section below).

    Data stats

    SplitNumber of RowsNumber unique tweets
    OOD3410225000
    Academic5168530119
    part_01183010000
    part_13568125799
    part_23425625000
    part_33647825000
    part_43751824999
    part_53662625000
    part_63400124984
    part_73412524981
    part_83255625000
    part_93265725000
    part_103244225000
    part_113203324972

    Data usage policy

    Use of this dataset is subject to you obtaining lawful access to the Twitter API, which requires you to agree to the Developer Terms Policies and Agreements.

    Please cite the following if you use TweetNERD in your paper:

    @dataset{TweetNERD_Zenodo_2022_6617192,
     author    = {Mishra, Shubhanshu and
             Saini, Aman and
             Makki, Raheleh and
             Mehta, Sneha and
             Haghighi, Aria and
             Mollahosseini, Ali},
     title    = {{TweetNERD - End to End Entity Linking Benchmark 
              for Tweets}},
     month    = jun,
     year     = 2022,
     note     = {{Data usage policy Use of this dataset is subject 
              to you obtaining lawful access to the [Twitter
              API](https://developer.twitter.com/en/docs
              /twitter-api), which requires you to agree to the
              [Developer Terms Policies and
              Agreements](https://developer.twitter.com/en
              /developer-terms/).}},
     publisher  = {Zenodo},
     version   = {0.0.0},
     doi     = {10.5281/zenodo.6617192},
     url     = {https://doi.org/10.5281/zenodo.6617192}
    }
    @inproceedings{TweetNERDNeurips2022,
     author = {Mishra, Shubhanshu and Saini, Aman and Makki, Raheleh and Mehta, Sneha and Haghighi, Aria and Mollahosseini, Ali},
     booktitle = {Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks},
     pages = {},
     title = {TweetNERD - End to End Entity Linking Benchmark for Tweets},
     volume = {2},
     year = {2022},
     eprint = {arXiv:2210.08129},
     doi = {10.48550/arXiv.2210.08129}
    }
    
  19. GAViD: Group Affect from ViDeos

    • zenodo.org
    csv, zip
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepak Kumar; Deepak Kumar; Puneet Kumar; Puneet Kumar; Xiaobai Li; Xiaobai Li; Balasubramanian Raman; Balasubramanian Raman (2025). GAViD: Group Affect from ViDeos [Dataset]. http://doi.org/10.5281/zenodo.15448846
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Deepak Kumar; Deepak Kumar; Puneet Kumar; Puneet Kumar; Xiaobai Li; Xiaobai Li; Balasubramanian Raman; Balasubramanian Raman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 1, 2025
    Description

    Overview

    We introduce the Group Affect from ViDeos (GAViD) dataset, which comprises 5091 video clips with multimodal data (video, audio, and context), annotated with ternary valence and discrete emotion labels, and enriched with VideoGPT-generated contextual metadata and human-annotated action cues. We also present CAGNet, a baseline model for multimodal context aware group affect recognition. CAGNet achieves 61.20% test accuracy on GAViD, comparable to state-of-the art performance in the field.

    NOTE: For now we are providing only Train video clips. The corresponding paper is under Review in ACM Multimedia 2025 Dataset Track. After its publication, the validation and Test set access will be granted upon request and approval, in accordance with the Responsible Use Policy.

    Dataset Description

    GAViD is a large-scale, in-the-wild multimodal dataset of 5091 samples, each annotated with the elements listed below. The following sections describe its key details and compilation procedure.

    1. Raw video clips of an average duration of five seconds,
    2. Audio aligned with the video clips,
    3. Contextual metadata (scene descriptions, event labels) generated by a multimodal LLM and human-verified,
    4. Group affect labels: ternary valence (positive, neutral, negative) and five discrete emotions (happy, sad, fear, anger, neutral),
    5. Emotion intensity ratings (high, medium, low),
    6. Interaction type labels (cooperative, hostile, neutral),
    7. Action cues (e.g. smiling, clapping, shouting, dancing, singing).

    Dataset details

    • Number of clips (samples) in GAViD-> 5130
    • Number of samples with some problem-> 39
    • Number of samples after filtering-> 5,091
    • Duration per clip-> 5 sec
    • Clip count per video-> 1–35
    • Dataset split-> Train: 3503; Val: 542; Test:1046
    • Affect labels (classwise distribution)-> Positive: 2600; Negative: 1189; Neutral: 1302
    • Emotion label distribution-> Neutral: 1522; Happy: 2428; Anger: 884; Sad: 201; Fear: 56

    Keywords used to rearch the raw videos from YouTube

    PositivePositiveNegativeNegativeNeutralNeutral
    Team CelebrationHappyProtestAngry SportGroup MeetingPanel Discussion
    Group MeetingVideo ConferenceHeated ArgumentViolent ProtestParliament speechPeople on street
    Get TogetherMeetingEmotional breakdown in PublicAggressive ArgumentPeople walking on streetTeam brainstorming Session
    CelebrationPress ConferenceSpritual GatheringAggressive GroupTeam Building ActivitiesGroup Discussion
    Religious gathering Talk Show Street RaceCondolenceGroup work sessionTeam Planning session
    FarewellGroup Performance Group FightWrestlingStudents in DiscussionWedding Group Dance
    People Dancing on StreetStreet Comedy MMA FightVIolenceRoundtable Discus-
    sion
    Oath
    Wedding PerformanceDhol masti BoxingSilent ProtestMental health ad-
    dress
    General Talk
    Couple group danceComedy showPeople in the fightGroup FightWedding CelebrationFestival Celebration

    Emotion Recognition Results using CAGNet

    ModelVal Acc.Val F1Test Acc.Test F1
    CAGNet62.55%0.45460.33%0.448

    Components of the Dataset

    The dataset comprises two main components:

    • GAViD_train.csv file: Contains bin number used by labelbox in the annotation process, video_id, group_emotion (Positive, Negative, Neutral), specific_emotion (happy, sad, fear, anger, neutral), emotion_intensity, interaction_type, action_cuse, Video Description genertaed using Video-ChatGPT model.
    • GAViD_Train_VideoClips.zip folder: Contains the video clips of train set [For Now we are providing only Train video clips. Validation and Test set video clips will be provided as per the request].

    Data Format and Fields of the CSV File

    The dataset is structured in GAViD.csv file along with corresponding Videos in related folders. This CSV file includes the following fields:

    • Video_ID: Unique Identifier of a video
    • Group_Affect: Positive, Negative, Neutral
    • Descrete_Emotion: Happy, Sad, Fear, Anger, Neutral
    • Emotion_Intensity: High, Medium, Low
    • Interaction_Type: Cooperative, Hostile, Neutral
    • Action_Cues: e.g. Smiling, Clapping, Shouting, Dancing, Singing etc.
    • Context: Each video clip's summary generated from the Video-ChatGPT model.

    Ethical considerations, data privacy and misuse prevention

    • Data Collection and Consent: The data collection and annotation strictly followed established ethical protocols in line with YouTube's Terms, which state “Public videos with a Creative Commons license may be reused". We downloaded only public-domain videos licensed under Creative Commons (CC BY 4.0), which “allows others to share, copy and redistribute the material in any medium or format, and to adapt, remix, transform, and build upon it for any purpose, even commercially".
    • Privacy: All content was reviewed to ensure no private or sensitive information is present. Faces are included only from public domain videos as needed for group affect research; only group-level content is released, with no attempt or risk of individual identification. Other personally identifiable information, such as
      names and addresses and contacts, was removed.

    Code and Citation

    • Code Repository: https: //github.com/deepakkumar-iitr/GAViD/tree/main
    • Citing the Dataset: Users of the dataset should cite the corresponding paper described at the above GitHub Repository.

    License & Access

    • This dataset is released for academic research only and is free to researchers from educational or

  20. Materials in Vessels Dataset, Annotated images of materials in transparent...

    • zenodo.org
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sagi Eppel; Sagi Eppel (2021). Materials in Vessels Dataset, Annotated images of materials in transparent vessels for semantic segmentation [Dataset]. http://doi.org/10.5281/zenodo.5769354
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 9, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sagi Eppel; Sagi Eppel
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Data set of materials in vessels
    The handling of materials in glassware vessels is the main task in chemistry laboratory research as well as a large number of other activities. Visual recognition of the physical phase of the
    materials is essential for many methods ranging from a simple task such as fill-level evaluation to the
    identification of more complex properties such as solvation, precipitation, crystallization and phase
    separation. To help train neural nets for this task, a new data set was created. The data set contains a
    thousand images of materials, in different phases and involved in different chemical processes, in a
    laboratory setting. Each pixel in each image is labeled according to several layers of classification, as
    given below:

    a. Vessel/Background: For each pixel assign value of one if it is part of the vessel and zero otherwise.
    This annotation was used as the ROI map for the valve filter method.

    b. Filled/Empty: This is similar to the above, but also distinguishes between the filled and empty
    regions of the vessel. For each pixel, one of the following three values is assigned:0 (background); 1
    (empty vessel); or 2 (filled vessel).

    c. Phase type: This is similar to the above but distinguishes between liquid and solid regions of the
    filled vessel. For each pixel, one of the following four values: 0 (background); 1 (empty vessel); 2
    (liquid); or 3 (solid).

    d. Fine-grained physical phase type: This is similar to the above but distinguishes between specific
    classes of physical phase. For each pixel, one of 15 values is assigned: 1 (background); 2 (empty
    vessel); 3 (liquid); 4 (liquid phase two, in the case where more than one phase of the liquid appears in
    the vessel); 5 (suspension); 6 (emulsion); 7 (foam); 8 (solid); 9 (gel); 10 (powder); 11 (granular); 12
    (bulk); 13 (solid-liquid mixture); 14 (solid phase two, in the case where more than one phase of solid
    exists in the vessel): and 15 (vapor).
    The annotations are given as images of the size of the original image, where the pixel value is the
    class number. The annotation of the vessel region (a) is used in the ROI input for the valve filter net .

    4.1. Validation/testing set
    The data set is divided into training and testing sets. The testing set is itself divided into two subsets;
    one contains images extracted from the same YouTube channels as the training set, and therefore was
    taken under similar conditions as the training images. The second subset contains images extracted
    from YouTube channels not included in the training set, and hence contains images taken under
    different conditions from those used to train the net.

    4.2. Creating the data set
    The creation of a large number of images with a variety of chemical processes and settings could have
    been a daunting task. Luckily, several YouTube channels dedicated to chemical experiments exist
    which offer high-quality footage of chemistry experiments. Thanks to these channels, including
    NurdRage, NileRed, ChemPlayer, it was possible to collect a large number of high-quality images in a
    short time. Pixel-wise annotation of these images was another challenging task, and was performed by
    Alexandra Emanuel and Mor Bismuth.

    For more details see: Setting attention region for convolutional neural networks using region selective features, for recognition of materials within glass vessels

    This dataset was first published in 2017.8

    For newer and Bigger datasets see

    https://zenodo.org/record/4736111#.YbG-RrtyZH4

    https://zenodo.org/record/3697452#.YbG-TLtyZH4

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Fabiano Belem (2016). Tag Recommendation Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.2067183.v4
Organization logo

Data from: Tag Recommendation Datasets

Related Article
Explore at:
txtAvailable download formats
Dataset updated
Jan 25, 2016
Dataset provided by
figshare
Authors
Fabiano Belem
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Associative Tag Recommendation Exploiting Multiple Textual FeaturesFabiano Belem, Eder Martins, Jussara M. Almeida Marcos Goncalves In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, July. 2011AbstractThis work addresses the task of recommending relevant tags to a target object by jointly exploiting three dimen- sions of the problem: (i) term co-occurrence with tags preassigned to the target object, (ii) terms extracted from mul- tiple textual features, and (iii) several metrics of tag relevance. In particular, we propose several new heuristic meth- ods, which extend previous, highly effective and efficient, state-of-the-art strategies by including new metrics that try to capture how accurately a candidate term describes the object’s content. We also exploit two learning to rank techniques, namely RankSVM and Genetic Programming, for the task of generating ranking functions that combine multiple metrics to accurately estimate the relevance of a tag to a given object. We evaluate all proposed methods in various scenarios for three popular Web 2.0 applications, namely, LastFM, YouTube and YahooVideo. We found that our new heuristics greatly outperform the methods on which they are based, producing gains in precision of up to 181%, as well as another state-of-the-art technique, with improvements in precision of up to 40% over the best baseline in any scenario. Some further improvements can also be achieved, in some scenarios, with the new learning-to-rank based strategies, which have the additional advantage of being quite flexible and easily extensible to exploit other aspects of the tag recommendation problem.Bibtex Citation@inproceedings{belem@sigir11, author = {Fabiano Bel\'em and Eder Martins and Jussara Almeida and Marcos Gon\c{c}alves}, title = {Associative Tag Recommendation Exploiting Multiple Textual Features}, booktitle = {{Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR'11)}}, month = {{July}}, year = {2011} }

Search
Clear search
Close search
Google apps
Main menu