100+ datasets found
  1. Human Activity Recognition (HAR - Video Dataset)

    • kaggle.com
    Updated May 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sharjeel M. (2023). Human Activity Recognition (HAR - Video Dataset) [Dataset]. http://doi.org/10.34740/kaggle/dsv/5722068
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 19, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sharjeel M.
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The dataset contains a comprehensive collection of human activity videos, spanning across 7 distinct classes. These classes include clapping, meeting and splitting, sitting, standing still, walking, walking while reading book, and walking while using the phone.

    Each video clip in the dataset showcases a specific human activity and has been labeled with the corresponding class to facilitate supervised learning.

    The primary inspiration behind creating this dataset is to enable machines to recognize and classify human activities accurately. With the advent of computer vision and deep learning techniques, it has become increasingly important to train machine learning models on large and diverse datasets to improve their accuracy and robustness.

  2. 30M High-Quality Video Dataset – Copyright-Cleared & Commercial-Ready

    • nexdata.ai
    Updated Apr 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). 30M High-Quality Video Dataset – Copyright-Cleared & Commercial-Ready [Dataset]. https://www.nexdata.ai/datasets/computervision/1803
    Explore at:
    Dataset updated
    Apr 7, 2025
    Dataset authored and provided by
    Nexdata
    Variables measured
    Data size, Image type, Data format, Data content, Image resolution
    Description

    This dataset features 30 million high-resolution video clips sourced from authorized and legally compliant channels. Each video offers exceptional clarity, color accuracy, and scene diversity across various environments and themes. All clips include clearly documented copyright ownership and commercial usage rights, making them safe and reliable for both academic research and business applications. The dataset is ideal for computer vision tasks such as video classification, action recognition, multimodal learning, object tracking, and content generation. Whether you're training deep learning models or sourcing clean, scalable visual data, this video dataset offers a comprehensive, large-scale solution.

  3. m

    Flood Amateur Video for Semantic Segmentation Dataset

    • data.mendeley.com
    Updated May 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naili Suri Intizhami (2024). Flood Amateur Video for Semantic Segmentation Dataset [Dataset]. http://doi.org/10.17632/3kzr8mt8s2.5
    Explore at:
    Dataset updated
    May 16, 2024
    Authors
    Naili Suri Intizhami
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is flood data in the city of Parepare, South Sulawesi Province, which contains video data collected from social media Instagram. This dataset was created to develop deep learning methods for recognizing floods and surrounding objects, specializing in semantic segmentation methods. This dataset consists of three folders, namely raw video data collected from Instagram, image data resulting from splitting the video into several images, and annotation data containing images that have been color-labeled according to their objects. There are 6 object classifications based on color labels, namely: floods (blue light), buildings (red), plants (green), people (sage), vehicles (orange), and sky (dark blue). This dataset has data in image (JPEG/PNG) and video (MP4) formats. This dataset is suitable for object recognition tasks with the semantic segmentation method. In addition, because this dataset contains original data in the form of videos and images, it can be developed for other purposes in the future. As a note, if you intend to use this dataset, please ensure that you comply with applicable copyright, privacy, and regulatory requirements. If you intend to read the paper about this dataset, please visit this link: https://doi.org/10.1016/j.dib.2023.109768

  4. i

    Sintel 4D Light Field Video Dataset

    • ieee-dataport.org
    Updated Mar 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Takahiro Kinoshita (2021). Sintel 4D Light Field Video Dataset [Dataset]. https://ieee-dataport.org/open-access/sintel-4d-light-field-video-dataset
    Explore at:
    Dataset updated
    Mar 26, 2021
    Authors
    Takahiro Kinoshita
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    9x9 views

  5. m

    THVD (Talking Head Video Dataset)

    • data.mendeley.com
    Updated Apr 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mario Peedor (2025). THVD (Talking Head Video Dataset) [Dataset]. http://doi.org/10.17632/ykhw8r7bfx.2
    Explore at:
    Dataset updated
    Apr 29, 2025
    Authors
    Mario Peedor
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    About

    We provide a comprehensive talking-head video dataset with over 50,000 videos, totaling more than 500 hours of footage and featuring 20,841 unique identities from around the world.

    Distribution

    Detailing the format, size, and structure of the dataset: Data Volume: -Total Size: 2.7TB

    -Total Videos: 47,547

    -Identities Covered: 20,841

    -Resolution: 60% 4k(1980), 33% fullHD(1080)

    -Formats: MP4

    -Full-length videos with visible mouth movements in every frame.

    -Minimum face size of 400 pixels.

    -Video durations range from 20 seconds to 5 minutes.

    -Faces have not been cut out, full screen videos including backgrounds.

    Usage

    This dataset is ideal for a variety of applications:

    Face Recognition & Verification: Training and benchmarking facial recognition models.

    Action Recognition: Identifying human activities and behaviors.

    Re-Identification (Re-ID): Tracking identities across different videos and environments.

    Deepfake Detection: Developing methods to detect manipulated videos.

    Generative AI: Training high-resolution video generation models.

    Lip Syncing Applications: Enhancing AI-driven lip-syncing models for dubbing and virtual avatars.

    Background AI Applications: Developing AI models for automated background replacement, segmentation, and enhancement.

    Coverage

    Explaining the scope and coverage of the dataset:

    Geographic Coverage: Worldwide

    Time Range: Time range and size of the videos have been noted in the CSV file.

    Demographics: Includes information about age, gender, ethnicity, format, resolution, and file size.

    Languages Covered (Videos):

    English: 23,038 videos

    Portuguese: 1,346 videos

    Spanish: 677 videos

    Norwegian: 1,266 videos

    Swedish: 1,056 videos

    Korean: 848 videos

    Polish: 1,807 videos

    Indonesian: 1,163 videos

    French: 1,102 videos

    German: 1,276 videos

    Japanese: 1,433 videos

    Dutch: 1,666 videos

    Indian: 1,163 videos

    Czech: 590 videos

    Chinese: 685 videos

    Italian: 975 videos

    Philipeans: 920 videos

    Bulgaria: 340 videos

    Romanian: 1144 videos

    Arabic: 1691 videos

    Who Can Use It

    List examples of intended users and their use cases:

    Data Scientists: Training machine learning models for video-based AI applications.

    Researchers: Studying human behavior, facial analysis, or video AI advancements.

    Businesses: Developing facial recognition systems, video analytics, or AI-driven media applications.

    Additional Notes

    Ensure ethical usage and compliance with privacy regulations. The dataset’s quality and scale make it valuable for high-performance AI training. Potential preprocessing (cropping, down sampling) may be needed for different use cases. Dataset has not been completed yet and expands daily, please contact for most up to date CSV file. The dataset has been divided into 100GB zipped files and is hosted on a private server (with the option to upload to the cloud if needed). To verify the dataset's quality, please contact me for the full CSV file.

  6. d

    M-ART | Video Data | Global | 100,000 Stock videos | Including metadata and...

    • datarade.ai
    Updated Sep 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M-ART (2025). M-ART | Video Data | Global | 100,000 Stock videos | Including metadata and releases | Dataset for AI & ML [Dataset]. https://datarade.ai/data-products/m-art-video-data-global-100-000-stock-videos-includin-m-art
    Explore at:
    .csv, .jpeg, .mp4, .movAvailable download formats
    Dataset updated
    Sep 11, 2025
    Dataset authored and provided by
    M-ART
    Area covered
    Paraguay, Andorra, Tunisia, Estonia, El Salvador, Bangladesh, Saint Helena, Benin, Chad, Curaçao
    Description

    "Collection of 100,000 high-quality video clips across diverse real-world domains, designed to accelerate the training and optimization of computer vision and multimodal AI models."

    Overview This dataset contains 100,000 proprietary and partner-produced video clips filmed in 4K/6K with cinema-grade RED cameras. Each clip is commercially cleared with full releases, structured metadata, and available in RAW or MOV/MP4 formats. The collection spans a wide variety of domains — people and lifestyle, healthcare and medical, food and cooking, office and business, sports and fitness, nature and landscapes, education, and more. This breadth ensures robust training data for computer vision, multimodal, and machine learning projects.

    The data set All 100,000 videos have been reviewed for quality and compliance. The dataset is optimized for AI model training, supporting use cases from face and activity recognition to scene understanding and generative AI. Custom datasets can also be produced on demand, enabling clients to close data gaps with tailored, high-quality content.

    About M-ART M-ART is a leading provider of cinematic-grade datasets for AI training. With extensive expertise in large-scale content production and curation, M-ART delivers both ready-to-use video datasets and fully customized collections. All data is proprietary, rights-cleared, and designed to help global AI leaders accelerate research, development, and deployment of next-generation models.

  7. d

    SAW-IT-Plus Video Dataset

    • dro.deakin.edu.au
    • researchdata.edu.au
    txt
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Don Driscoll; Thi Thu Thuy Nguyen; Anne Eichholtzer (2023). SAW-IT-Plus Video Dataset [Dataset]. http://doi.org/10.26187/deakin.22359847.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 31, 2023
    Dataset provided by
    Deakin University
    Authors
    Don Driscoll; Thi Thu Thuy Nguyen; Anne Eichholtzer
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The SAW-IT-Plus dataset contains 11,458 videos collected in the wild, and 22 homemade videos (snake category). Videos are arranged in 8 main categories of animals (frogs - 0, snakes - 1, lizards - 2, birds - 3, small mammals < 2kg - 4, medium or large mammals > 2kg - 5, spiders - 7 and scorpions - 8). Echidnas - originally category 6 – were merged with big mammals. Some videos of crustacea and other reptiles are available but not classified. Empty videos (7,896) were added to allow for further testing of the algorithm. They are separated in 3 categories (details in Table 1).

    CSV files are available to detail the species for frogs, lizards, birds and small mammals for each video. Because the videos were mainly collected from real-world data; the number of videos for each animal category are unbalanced (Table 1). This folder also contains training images used to automatically detect videos containing animals in our overall dataset. More information available in the ReadMe files.

    The dataset was collected in Victoria, Australia, from February to October 2021 as part of the ERP22 (formerly ARI-PPD 05) grant.

  8. d

    Re-ID Data | 600,000 ID | CCTV Data |Computer Vision Data| Identity Data| AI...

    • datarade.ai
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). Re-ID Data | 600,000 ID | CCTV Data |Computer Vision Data| Identity Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-re-id-data-60-000-id-image-video-ai-ml-train-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    Trinidad and Tobago, Portugal, Russian Federation, Sri Lanka, Turkmenistan, Bolivia (Plurinational State of), United Arab Emirates, Cuba, Luxembourg, Ecuador
    Description
    1. Specifications Data size : 60,000 ID

    Population distribution : the race distribution is Asians, Caucasians and black people, the gender distribution is male and female, the age distribution is from children to the elderly

    Collecting environment : including indoor and outdoor scenes (such as supermarket, mall and residential area, etc.)

    Data diversity : different ages, different time periods, different cameras, different human body orientations and postures, different ages collecting environment

    Device : surveillance cameras, the image resolution is not less than 1,9201,080

    Data format : the image data format is .jpg, the annotation file format is .json

    Annotation content : human body rectangular bounding boxes, 15 human body attributes

    Quality Requirements : A rectangular bounding box of human body is qualified when the deviation is not more than 3 pixels, and the qualified rate of the bounding boxes shall not be lower than 97%;Annotation accuracy of attributes is over 97%

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 3 million hours of Audio Data and 800TB of Annotated Imagery Data.These ready-to-go Identity Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/computervision?source=Datarade
  9. Kinetics-400-[test-set]

    • kaggle.com
    Updated Sep 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Innat (2023). Kinetics-400-[test-set] [Dataset]. https://www.kaggle.com/datasets/ipythonx/k4testset/data?select=videos_val
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 11, 2023
    Dataset provided by
    Kaggle
    Authors
    Innat
    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1984321%2Fee10abf5409ea4eaaad3dfaa9514a4bb%2FScreenshot_2021-08-06_at_16.15.03.png?generation=1694441423300452&alt=media" alt="">

    Video Action Recognition : Kinetics 400

    The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands. Homepage.

    License

    The kinetics dataset is licensed by Google Inc. under a Creative Commons Attribution 4.0 International License. Published. May 22, 2017.

  10. t

    Annotated video footage for automated identification and counting of fish in...

    • service.tib.eu
    Updated Nov 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Annotated video footage for automated identification and counting of fish in unconstrained marine environments - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-926930
    Explore at:
    Dataset updated
    Nov 30, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Computer vision techniques such as deep learning have quickly become of interest to ecologists for automatically processing large volumes of video and image-based data. However, training deep learning models often require large volumes of manually annotated footage to create a robust and accurate model. The collection and annotation of these training datasets can incur high initial labour cost and may not be feasible for some research projects. The accessibility of publicly available datasets that are pre-annotated for easy implementation is imperative for continued research and understanding of computer vision technology as a viable method to automate the processing of visual data. In this report, we provide a dataset containing ~ 9,000 annotated fish in unconstrained conditions in a key coastal habitat - seagrass meadows - collected via remote underwater video. These images include object instance annotations which consist of a corresponding image, label, bounding box and segmentation mask. These data can be used for training several different computer vision models and for investigating the effects of pre- or post-processing steps to improve model performance when predicting data in awuatic habitats. The purpose of this report, in conjunction with the annotated dataset, is to advance the use of CV techniques and further the growth in labelled fish datasets publicly available.

  11. m

    Video Dataset of Sheep Activity (Grazing, Running, Sitting)

    • data.mendeley.com
    Updated Oct 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bilal Khan (2023). Video Dataset of Sheep Activity (Grazing, Running, Sitting) [Dataset]. http://doi.org/10.17632/h5ppwx6fn4.1
    Explore at:
    Dataset updated
    Oct 24, 2023
    Authors
    Bilal Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A primary dataset capturing five distinct types of sheep activities in realistic settings has been constructed at various resolutions and viewing angles, targeting the expansion of the domain knowledge for non-contact virtual fencing approaches. The present dataset can be used to develop non-invasive approaches for sheep activity detection, which can be proven useful for farming activities including, but not limited to: (i) sheep counting, (ii) virtual fencing, (iii) behavior detection for health status, and (v) effective sheep breeding. Sheep activity classes include grazing, running, sitting, standing, and walking. The activities of individuals, as well as herds of sheep, were recorded at different resolutions and angles to provide a dataset of diverse characteristics. Overall, a total of 149,327 frames from 417 videos (the equivalent of 59 minutes of footage) are presented with a balanced set for each activity class, which can be utilized for robust non-invasive detection models based on computer vision techniques. Despite a decent existence of noise within the original data (e.g., segments with no sheep present, multiple sheep in single frames, multiple activities by one or more sheep in single as well as multiple frames, segments with sheep alongside other non-sheep objects), we provide original videos and the extracted frames (with videos and frames containing humans omitted for privacy reasons). The present dataset includes diverse sheep activity characteristics and can be useful for robust detection and recognition models, as well as advanced activity detection models as a function of time for the applications.

    Note: We separated three additional classes of the same dataset into a separate Mendeley dataset since the size of the video set was above the allowed limit by Mendeley Data Repository. Those three additional classes are uploaded in a separate dataset which can be accessed via (Reference below with DOI: 10.17632/w65pvb84dg.1). Three classes from the DOI (provided below) can be copied into this main folder in order to have the complete dataset with 5 distinct classes.

    Khan, Bilal; Kelly, Nathan (2023), “Video Dataset of Sheep Activity (Standing and Walking)”, Mendeley Data, V1, doi: 10.17632/w65pvb84dg.1

  12. Data from: An Annotated Video Dataset for Computing Video Memorability

    • figshare.com
    zip
    Updated Aug 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rukiye Savran Kiziltepe; Alan Smeaton; Lorin Sweeney; Mihai Gabriel Constantin; Faiyaz Doctor; Alba Garcia Seco de Herrera; Claire-Helene Demarty; Graham Healy; Bogdan Ionescu (2021). An Annotated Video Dataset for Computing Video Memorability [Dataset]. http://doi.org/10.6084/m9.figshare.15105867.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 5, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Rukiye Savran Kiziltepe; Alan Smeaton; Lorin Sweeney; Mihai Gabriel Constantin; Faiyaz Doctor; Alba Garcia Seco de Herrera; Claire-Helene Demarty; Graham Healy; Bogdan Ionescu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Media Memorability 2020 dataset contains a subset of short videos selected from the TRECVid 2019 Video-to-Text dataset. The dataset contains links to, as well as features describing and annotations on, 590videos as part of the training set and 410 videos as part of development set. It also contains links to, and features describing, 500 videos used as test videos for the MediaEval Video Memorability benchmark in 2020.

  13. d

    FileMarket | Dataset for Face Anti-Spoofing (Videos) in Computer Vision...

    • datarade.ai
    Updated Jul 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FileMarket (2024). FileMarket | Dataset for Face Anti-Spoofing (Videos) in Computer Vision Applications | Machine Learning (ML) Data | Deep Learning (DL) Data [Dataset]. https://datarade.ai/data-products/filemarket-dataset-for-face-anti-spoofing-videos-in-compu-filemarket
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Jul 10, 2024
    Dataset authored and provided by
    FileMarket
    Area covered
    United Republic of, Cabo Verde, Libya, Russian Federation, South Sudan, Ukraine, Mauritania, Germany, Sao Tome and Principe, Guinea-Bissau
    Description

    Live Face Anti-Spoof Dataset

    A live face dataset is crucial for advancing computer vision tasks such as face detection, anti-spoofing detection, and face recognition. The Live Face Anti-Spoof Dataset offered by Ainnotate is specifically designed to train algorithms for anti-spoofing purposes, ensuring that AI systems can accurately differentiate between real and fake faces in various scenarios.

    Key Features:

    Comprehensive Video Collection: The dataset features thousands of videos showcasing a diverse range of individuals, including males and females, with and without glasses. It also includes men with beards, mustaches, and clean-shaven faces. Lighting Conditions: Videos are captured in both indoor and outdoor environments, ensuring that the data covers a wide range of lighting conditions, making it highly applicable for real-world use. Data Collection Method: Our datasets are gathered through a community-driven approach, leveraging our extensive network of over 700k users across various Telegram apps. This method ensures that the data is not only diverse but also ethically sourced with full consent from participants, providing reliable and real-world applicable data for training AI models. Versatility: This dataset is ideal for training models in face detection, anti-spoofing, and face recognition tasks, offering robust support for these essential computer vision applications. In addition to the Live Face Anti-Spoof Dataset, FileMarket provides specialized datasets across various categories to support a wide range of AI and machine learning projects:

    Object Detection Data: Perfect for training AI in image and video analysis. Machine Learning (ML) Data: Offers a broad spectrum of applications, from predictive analytics to natural language processing (NLP). Large Language Model (LLM) Data: Designed to support text generation, chatbots, and machine translation models. Deep Learning (DL) Data: Essential for developing complex neural networks and deep learning models. Biometric Data: Includes diverse datasets for facial recognition, fingerprint analysis, and other biometric applications. This live face dataset, alongside our other specialized data categories, empowers your AI projects by providing high-quality, diverse, and comprehensive datasets. Whether your focus is on anti-spoofing detection, face recognition, or other biometric and machine learning tasks, our data offerings are tailored to meet your specific needs.

  14. r

    Data from: CVB: A Video Dataset of Cattle Visual Behaviors

    • researchdata.edu.au
    datadownload
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Flavio Pereira Alvarenga; Aaron Ingham; Lars Petersson; Brano Kusy; Vivien Rolland; Brendan Do; Neil Bagnall; Jody McNally; Greg Bishop-Hurley; Reza Arablouei; Ali Zia; Renuka Sharma (2023). CVB: A Video Dataset of Cattle Visual Behaviors [Dataset]. http://doi.org/10.25919/3G3T-P068
    Explore at:
    datadownloadAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    Commonwealth Scientific and Industrial Research Organisation
    Authors
    Flavio Pereira Alvarenga; Aaron Ingham; Lars Petersson; Brano Kusy; Vivien Rolland; Brendan Do; Neil Bagnall; Jody McNally; Greg Bishop-Hurley; Reza Arablouei; Ali Zia; Renuka Sharma
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Time period covered
    Aug 1, 2022 - Apr 28, 2023
    Area covered
    Description

    Existing image/video datasets for cattle behavior recognition are mostly small, lack well-defined labels, or are collected in unrealistic controlled environments. This limits the utility of machine learning (ML) models learned from them. Therefore, we introduce a new dataset, called Cattle Visual Behaviors (CVB), that consists of 502 video clips, each fifteen seconds long, captured in natural lighting conditions, and annotated with eleven visually perceptible behaviors of grazing cattle. By creating and sharing CVB, our aim is to develop improved models capable of recognizing all important behaviors accurately and to assist other researchers and practitioners in developing and evaluating new ML models for cattle behavior classification using video data. The dataset is presented in the form of following three sub-directories. 1. raw_frames: contains 450 frames in each sub folder, representing 15 sec video, taking at a frames rate of 30 FPS, 2. annotations: contains the json files corresponding to the raw_frames folder. We have one json file for one video, containing the bounding box annotations for each cattle and their associated behaviors, and 3. CVB_in_AVA_format: contains the CVB data in the standard AVA dataset format which we have used to apply SlowFast model. Lineage: We use the Computer Vision Annotation Tool (CVAT) to collect our annotations. To make the procedure more efficient, we perform an initial detection and tracking of cattle in the videos using appropriate pre-trained models. The results are corrected by domain experts along with cattle behavior labeling in CVAT. The pre-hoc detection and tracking step significantly reduces the manual annotation time and effort.

  15. g

    Emotion Detection in Video Content

    • gts.ai
    json
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2023). Emotion Detection in Video Content [Dataset]. https://gts.ai/case-study/emotion-detection-in-video-content/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Nov 20, 2023
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Explore the nuances of emotion detection in video content Technologies and methods that make emotion analysis more accurate and efficient.

  16. spacejam - Action Recognition

    • kaggle.com
    Updated Mar 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 9, 2021
    Dataset provided by
    Kaggle
    Authors
    antocommii
    Description

    Dataset

    This dataset was created by antocommii

    Contents

  17. Data from: A comprehensive video dataset for Multi-Modal Recognition Systems...

    • zenodo.org
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anand Handa; Dr. Rashi Agarwal; Prof. Narendra Kohli; Anand Handa; Dr. Rashi Agarwal; Prof. Narendra Kohli (2020). A comprehensive video dataset for Multi-Modal Recognition Systems [Dataset]. http://doi.org/10.5281/zenodo.1492227
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anand Handa; Dr. Rashi Agarwal; Prof. Narendra Kohli; Anand Handa; Dr. Rashi Agarwal; Prof. Narendra Kohli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A fully-labelled video dataset will act as a unique resource for researchers and analysts in the fields such as machine learning, computer vision and deep learning. The videos contain similar text recited by 67 different subjects. The text contains digits from 1 to 20 recited by 67 different subjects within the same experimental setup.

  18. u

    Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks...

    • rdr.ucl.ac.uk
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernando Pérez-García; Catherine Scott; Rachel Sparks; Beate Diehl; Sebastien Ourselin (2023). Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures" [Dataset]. http://doi.org/10.5522/04/14781771.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    University College London
    Authors
    Fernando Pérez-García; Catherine Scott; Rachel Sparks; Beate Diehl; Sebastien Ourselin
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This is the dataset to support the paper:Fernando Pérez-García et al., 2021, Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures.The paper has been accepted for publication at the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021).A preprint is available on arXiv: https://arxiv.org/abs/2106.12014Contents:1) A CSV file "seizures.csv" with the following fields: - Subject: subject number - Seizure: seizure number - OnsetClonic: annotation marking the onset of the clonic phase - GTCS: whether the seizure generalises - Discard: whether one (Large, Small), none (No) or both (Yes) views were discarded for training.2) A folder "features_fpc_8_fps_15" containing two folders per seizure. The folders contain features extracted from all possible snippets from the small (S) and large (L) views. The snippets were 8 frames long and downsampled to 15 frames per second. The features are in ".pth" format and can be loaded using PyTorch: https://pytorch.org/docs/stable/generated/torch.load.html The last number of the file name indicates the frame index. For example, the file "006_01_L_000015.pth" corresponds to the features extracted from a snippet starting one second into the seizure video. Each file contains 512 numbers representing the deep features extracted from the corresponding snippet.3) A description file, "README.txt".

  19. m

    SDFVD: Small-scale Deepfake Forgery Video Dataset

    • data.mendeley.com
    Updated Apr 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shilpa Kaman (2024). SDFVD: Small-scale Deepfake Forgery Video Dataset [Dataset]. http://doi.org/10.17632/bcmkfgct2s.1
    Explore at:
    Dataset updated
    Apr 23, 2024
    Authors
    Shilpa Kaman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Small-scale Deepfake Forgery Video Dataset (SDFVD) is a custom dataset consisting of real and deepfake videos with diverse contexts designed to study and benchmark deepfake detection algorithms. The dataset comprising of a total of 106 videos, with 53 original and 53 deepfake videos. Equal number of real and deepfake videos, ensures balance for machine learning model training and evaluation. The original videos were collected from Pexels: a well- known provider of stock photography and stock footage(video). These videos include a variety of backgrounds, and the subjects represent different genders and ages, reflecting a diverse range of scenarios. The input videos have been pre-processed by cropping them to a length of approximately 4 to 5 seconds and resizing them to 720p resolution, ensuring a consistent and uniform format across the dataset. Deepfake videos were generated using Remaker AI employing face-swapping techniques. Remaker AI is an AI-powered platform that can generate images, swap faces in photos and videos, and edit content. The source face photos for these swaps were taken from Freepik: is an image bank website provides contents such as photographs, illustrations and vector images. SDFVD was created due to the lack of availability of any such comparable small-scale deepfake video datasets. Key benefits of such datasets are: • In educational settings or smaller research labs, smaller datasets can be particularly useful as they require fewer resources, allowing students and researchers to conduct experiments with limited budgets and computational resources. • Researchers can use small-scale datasets to quickly prototype new ideas, test concepts, and refine algorithms before scaling up to larger datasets. Overall, SDFVD offers a compact but diverse collection of real and deepfake videos, suitable for a variety of applications, including research, security, and education. It serves as a valuable resource for exploring the rapidly evolving field of deepfake technology and its impact on society.

  20. d

    Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data...

    • datarade.ai
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data | Large Language Model(LLM) Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-image-and-video-description-data-1-pb-multimoda-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Jan 3, 2025
    Dataset authored and provided by
    Nexdata
    Area covered
    Mexico, Canada, United Arab Emirates, Malta, Ecuador, Israel, Czech Republic, Netherlands, Finland, Belgium
    Description
    1. Image Description Data Data Size: 500 million pairs Image Type: generic scene(portrait, landscapes, animals,etc), human action, picture book, magazine, PPT&chart, App screenshot, and etc. Resolution: 4K+ Description Language: English, Spanish, Portuguese, French, Korean, German, Chinese, Japanese Description Length: text length is no less than 250 words Format: the image format is .jpg, the annotation format is .json, and the description format is .txt

    2. Video Description Data Data Size: 10 million pairs Image Type: generic scene(portrait, landscapes, animals,etc), ads, TV sports, documentaries Resolution: 1080p+ Description Language: English, Spanish, Portuguese, French, Korean, German, Chinese, Japanese Description Length: text length is no less than 250 words Format: .mp4,.mov,.avi and other common formats;.xlsx (annotation file format)

    3. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 3 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/llm?source=Datarade

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sharjeel M. (2023). Human Activity Recognition (HAR - Video Dataset) [Dataset]. http://doi.org/10.34740/kaggle/dsv/5722068
Organization logo

Human Activity Recognition (HAR - Video Dataset)

The dataset features 7 different classes of Human Activities in Videos.

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sharjeel M.
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

The dataset contains a comprehensive collection of human activity videos, spanning across 7 distinct classes. These classes include clapping, meeting and splitting, sitting, standing still, walking, walking while reading book, and walking while using the phone.

Each video clip in the dataset showcases a specific human activity and has been labeled with the corresponding class to facilitate supervised learning.

The primary inspiration behind creating this dataset is to enable machines to recognize and classify human activities accurately. With the advent of computer vision and deep learning techniques, it has become increasingly important to train machine learning models on large and diverse datasets to improve their accuracy and robustness.

Search
Clear search
Close search
Google apps
Main menu