100+ datasets found

Human Activity Recognition (HAR - Video Dataset)
kaggle.com
Updated May 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sharjeel M. (2023). Human Activity Recognition (HAR - Video Dataset) [Dataset]. http://doi.org/10.34740/kaggle/dsv/5722068
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/5722068
Dataset updated
May 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sharjeel M.
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The dataset contains a comprehensive collection of human activity videos, spanning across 7 distinct classes. These classes include clapping, meeting and splitting, sitting, standing still, walking, walking while reading book, and walking while using the phone.

Each video clip in the dataset showcases a specific human activity and has been labeled with the corresponding class to facilitate supervised learning.

The primary inspiration behind creating this dataset is to enable machines to recognize and classify human activities accurately. With the advent of computer vision and deep learning techniques, it has become increasingly important to train machine learning models on large and diverse datasets to improve their accuracy and robustness.
30M High-Quality Video Dataset – Copyright-Cleared & Commercial-Ready
nexdata.ai
Updated Apr 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). 30M High-Quality Video Dataset – Copyright-Cleared & Commercial-Ready [Dataset]. https://www.nexdata.ai/datasets/computervision/1803
Explore at:
Dataset updated
Apr 7, 2025
Dataset authored and provided by
Nexdata
Variables measured
Data size, Image type, Data format, Data content, Image resolution
Description
This dataset features 30 million high-resolution video clips sourced from authorized and legally compliant channels. Each video offers exceptional clarity, color accuracy, and scene diversity across various environments and themes. All clips include clearly documented copyright ownership and commercial usage rights, making them safe and reliable for both academic research and business applications. The dataset is ideal for computer vision tasks such as video classification, action recognition, multimodal learning, object tracking, and content generation. Whether you're training deep learning models or sourcing clean, scalable visual data, this video dataset offers a comprehensive, large-scale solution.
m
Flood Amateur Video for Semantic Segmentation Dataset
data.mendeley.com
Updated May 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naili Suri Intizhami (2024). Flood Amateur Video for Semantic Segmentation Dataset [Dataset]. http://doi.org/10.17632/3kzr8mt8s2.5
Explore at:
Unique identifier
https://doi.org/10.17632/3kzr8mt8s2.5
Dataset updated
May 16, 2024
Authors
Naili Suri Intizhami
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is flood data in the city of Parepare, South Sulawesi Province, which contains video data collected from social media Instagram. This dataset was created to develop deep learning methods for recognizing floods and surrounding objects, specializing in semantic segmentation methods. This dataset consists of three folders, namely raw video data collected from Instagram, image data resulting from splitting the video into several images, and annotation data containing images that have been color-labeled according to their objects. There are 6 object classifications based on color labels, namely: floods (blue light), buildings (red), plants (green), people (sage), vehicles (orange), and sky (dark blue). This dataset has data in image (JPEG/PNG) and video (MP4) formats. This dataset is suitable for object recognition tasks with the semantic segmentation method. In addition, because this dataset contains original data in the form of videos and images, it can be developed for other purposes in the future. As a note, if you intend to use this dataset, please ensure that you comply with applicable copyright, privacy, and regulatory requirements. If you intend to read the paper about this dataset, please visit this link: https://doi.org/10.1016/j.dib.2023.109768
i
Sintel 4D Light Field Video Dataset
ieee-dataport.org
Updated Mar 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Takahiro Kinoshita (2021). Sintel 4D Light Field Video Dataset [Dataset]. https://ieee-dataport.org/open-access/sintel-4d-light-field-video-dataset
Explore at:
Dataset updated
Mar 26, 2021
Authors
Takahiro Kinoshita
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
9x9 views
m
THVD (Talking Head Video Dataset)
data.mendeley.com
Updated Apr 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mario Peedor (2025). THVD (Talking Head Video Dataset) [Dataset]. http://doi.org/10.17632/ykhw8r7bfx.2
Explore at:
Unique identifier
https://doi.org/10.17632/ykhw8r7bfx.2
Dataset updated
Apr 29, 2025
Authors
Mario Peedor
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
About

We provide a comprehensive talking-head video dataset with over 50,000 videos, totaling more than 500 hours of footage and featuring 20,841 unique identities from around the world.

Distribution

Detailing the format, size, and structure of the dataset: Data Volume: -Total Size: 2.7TB

-Total Videos: 47,547

-Identities Covered: 20,841

-Resolution: 60% 4k(1980), 33% fullHD(1080)

-Formats: MP4

-Full-length videos with visible mouth movements in every frame.

-Minimum face size of 400 pixels.

-Video durations range from 20 seconds to 5 minutes.

-Faces have not been cut out, full screen videos including backgrounds.

Usage

This dataset is ideal for a variety of applications:

Face Recognition & Verification: Training and benchmarking facial recognition models.

Action Recognition: Identifying human activities and behaviors.

Re-Identification (Re-ID): Tracking identities across different videos and environments.

Deepfake Detection: Developing methods to detect manipulated videos.

Generative AI: Training high-resolution video generation models.

Lip Syncing Applications: Enhancing AI-driven lip-syncing models for dubbing and virtual avatars.

Background AI Applications: Developing AI models for automated background replacement, segmentation, and enhancement.

Coverage

Explaining the scope and coverage of the dataset:

Geographic Coverage: Worldwide

Time Range: Time range and size of the videos have been noted in the CSV file.

Demographics: Includes information about age, gender, ethnicity, format, resolution, and file size.

Languages Covered (Videos):

English: 23,038 videos

Portuguese: 1,346 videos

Spanish: 677 videos

Norwegian: 1,266 videos

Swedish: 1,056 videos

Korean: 848 videos

Polish: 1,807 videos

Indonesian: 1,163 videos

French: 1,102 videos

German: 1,276 videos

Japanese: 1,433 videos

Dutch: 1,666 videos

Indian: 1,163 videos

Czech: 590 videos

Chinese: 685 videos

Italian: 975 videos

Philipeans: 920 videos

Bulgaria: 340 videos

Romanian: 1144 videos

Arabic: 1691 videos

Who Can Use It

List examples of intended users and their use cases:

Data Scientists: Training machine learning models for video-based AI applications.

Researchers: Studying human behavior, facial analysis, or video AI advancements.

Businesses: Developing facial recognition systems, video analytics, or AI-driven media applications.

Additional Notes

Ensure ethical usage and compliance with privacy regulations. The dataset’s quality and scale make it valuable for high-performance AI training. Potential preprocessing (cropping, down sampling) may be needed for different use cases. Dataset has not been completed yet and expands daily, please contact for most up to date CSV file. The dataset has been divided into 100GB zipped files and is hosted on a private server (with the option to upload to the cloud if needed). To verify the dataset's quality, please contact me for the full CSV file.
d
M-ART | Video Data | Global | 100,000 Stock videos | Including metadata and...
datarade.ai
Updated Sep 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M-ART (2025). M-ART | Video Data | Global | 100,000 Stock videos | Including metadata and releases | Dataset for AI & ML [Dataset]. https://datarade.ai/data-products/m-art-video-data-global-100-000-stock-videos-includin-m-art
Explore at:
.csv, .jpeg, .mp4, .movAvailable download formats
Dataset updated
Sep 11, 2025
Dataset authored and provided by
M-ART
Area covered
Paraguay, Andorra, Tunisia, Estonia, El Salvador, Bangladesh, Saint Helena, Benin, Chad, Curaçao
Description
"Collection of 100,000 high-quality video clips across diverse real-world domains, designed to accelerate the training and optimization of computer vision and multimodal AI models."

Overview This dataset contains 100,000 proprietary and partner-produced video clips filmed in 4K/6K with cinema-grade RED cameras. Each clip is commercially cleared with full releases, structured metadata, and available in RAW or MOV/MP4 formats. The collection spans a wide variety of domains — people and lifestyle, healthcare and medical, food and cooking, office and business, sports and fitness, nature and landscapes, education, and more. This breadth ensures robust training data for computer vision, multimodal, and machine learning projects.

The data set All 100,000 videos have been reviewed for quality and compliance. The dataset is optimized for AI model training, supporting use cases from face and activity recognition to scene understanding and generative AI. Custom datasets can also be produced on demand, enabling clients to close data gaps with tailored, high-quality content.

About M-ART M-ART is a leading provider of cinematic-grade datasets for AI training. With extensive expertise in large-scale content production and curation, M-ART delivers both ready-to-use video datasets and fully customized collections. All data is proprietary, rights-cleared, and designed to help global AI leaders accelerate research, development, and deployment of next-generation models.
d
SAW-IT-Plus Video Dataset
dro.deakin.edu.au
researchdata.edu.au
txt
Updated Mar 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Don Driscoll; Thi Thu Thuy Nguyen; Anne Eichholtzer (2023). SAW-IT-Plus Video Dataset [Dataset]. http://doi.org/10.26187/deakin.22359847.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.26187/deakin.22359847.v1
Dataset updated
Mar 31, 2023
Dataset provided by
Deakin University
Authors
Don Driscoll; Thi Thu Thuy Nguyen; Anne Eichholtzer
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The SAW-IT-Plus dataset contains 11,458 videos collected in the wild, and 22 homemade videos (snake category). Videos are arranged in 8 main categories of animals (frogs - 0, snakes - 1, lizards - 2, birds - 3, small mammals < 2kg - 4, medium or large mammals > 2kg - 5, spiders - 7 and scorpions - 8). Echidnas - originally category 6 – were merged with big mammals. Some videos of crustacea and other reptiles are available but not classified. Empty videos (7,896) were added to allow for further testing of the algorithm. They are separated in 3 categories (details in Table 1).

CSV files are available to detail the species for frogs, lizards, birds and small mammals for each video. Because the videos were mainly collected from real-world data; the number of videos for each animal category are unbalanced (Table 1). This folder also contains training images used to automatically detect videos containing animals in our overall dataset. More information available in the ReadMe files.

The dataset was collected in Victoria, Australia, from February to October 2021 as part of the ERP22 (formerly ARI-PPD 05) grant.
d
Re-ID Data | 600,000 ID | CCTV Data |Computer Vision Data| Identity Data| AI...
datarade.ai
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). Re-ID Data | 600,000 ID | CCTV Data |Computer Vision Data| Identity Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-re-id-data-60-000-id-image-video-ai-ml-train-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Dec 8, 2023
Dataset authored and provided by
Nexdata
Area covered
Trinidad and Tobago, Portugal, Russian Federation, Sri Lanka, Turkmenistan, Bolivia (Plurinational State of), United Arab Emirates, Cuba, Luxembourg, Ecuador
Description
Specifications Data size : 60,000 ID

Population distribution : the race distribution is Asians, Caucasians and black people, the gender distribution is male and female, the age distribution is from children to the elderly

Collecting environment : including indoor and outdoor scenes (such as supermarket, mall and residential area, etc.)

Data diversity : different ages, different time periods, different cameras, different human body orientations and postures, different ages collecting environment

Device : surveillance cameras, the image resolution is not less than 1,9201,080

Data format : the image data format is .jpg, the annotation file format is .json

Annotation content : human body rectangular bounding boxes, 15 human body attributes

Quality Requirements : A rectangular bounding box of human body is qualified when the deviation is not more than 3 pixels, and the qualified rate of the bounding boxes shall not be lower than 97%;Annotation accuracy of attributes is over 97%

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 3 million hours of Audio Data and 800TB of Annotated Imagery Data.These ready-to-go Identity Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/computervision?source=Datarade
Kinetics-400-[test-set]
kaggle.com
Updated Sep 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Innat (2023). Kinetics-400-[test-set] [Dataset]. https://www.kaggle.com/datasets/ipythonx/k4testset/data?select=videos_val
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 11, 2023
Dataset provided by
Kaggle
Authors
Innat
Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1984321%2Fee10abf5409ea4eaaad3dfaa9514a4bb%2FScreenshot_2021-08-06_at_16.15.03.png?generation=1694441423300452&alt=media" alt="">

Video Action Recognition : Kinetics 400

The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands. Homepage.

License

The kinetics dataset is licensed by Google Inc. under a Creative Commons Attribution 4.0 International License. Published. May 22, 2017.
t
Annotated video footage for automated identification and counting of fish in...
service.tib.eu
Updated Nov 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Annotated video footage for automated identification and counting of fish in unconstrained marine environments - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-926930
Explore at:
Dataset updated
Nov 30, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Computer vision techniques such as deep learning have quickly become of interest to ecologists for automatically processing large volumes of video and image-based data. However, training deep learning models often require large volumes of manually annotated footage to create a robust and accurate model. The collection and annotation of these training datasets can incur high initial labour cost and may not be feasible for some research projects. The accessibility of publicly available datasets that are pre-annotated for easy implementation is imperative for continued research and understanding of computer vision technology as a viable method to automate the processing of visual data. In this report, we provide a dataset containing ~ 9,000 annotated fish in unconstrained conditions in a key coastal habitat - seagrass meadows - collected via remote underwater video. These images include object instance annotations which consist of a corresponding image, label, bounding box and segmentation mask. These data can be used for training several different computer vision models and for investigating the effects of pre- or post-processing steps to improve model performance when predicting data in awuatic habitats. The purpose of this report, in conjunction with the annotated dataset, is to advance the use of CV techniques and further the growth in labelled fish datasets publicly available.
m
Video Dataset of Sheep Activity (Grazing, Running, Sitting)
data.mendeley.com
Updated Oct 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bilal Khan (2023). Video Dataset of Sheep Activity (Grazing, Running, Sitting) [Dataset]. http://doi.org/10.17632/h5ppwx6fn4.1
Explore at:
Unique identifier
https://doi.org/10.17632/h5ppwx6fn4.1
Dataset updated
Oct 24, 2023
Authors
Bilal Khan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A primary dataset capturing five distinct types of sheep activities in realistic settings has been constructed at various resolutions and viewing angles, targeting the expansion of the domain knowledge for non-contact virtual fencing approaches. The present dataset can be used to develop non-invasive approaches for sheep activity detection, which can be proven useful for farming activities including, but not limited to: (i) sheep counting, (ii) virtual fencing, (iii) behavior detection for health status, and (v) effective sheep breeding. Sheep activity classes include grazing, running, sitting, standing, and walking. The activities of individuals, as well as herds of sheep, were recorded at different resolutions and angles to provide a dataset of diverse characteristics. Overall, a total of 149,327 frames from 417 videos (the equivalent of 59 minutes of footage) are presented with a balanced set for each activity class, which can be utilized for robust non-invasive detection models based on computer vision techniques. Despite a decent existence of noise within the original data (e.g., segments with no sheep present, multiple sheep in single frames, multiple activities by one or more sheep in single as well as multiple frames, segments with sheep alongside other non-sheep objects), we provide original videos and the extracted frames (with videos and frames containing humans omitted for privacy reasons). The present dataset includes diverse sheep activity characteristics and can be useful for robust detection and recognition models, as well as advanced activity detection models as a function of time for the applications.

Note: We separated three additional classes of the same dataset into a separate Mendeley dataset since the size of the video set was above the allowed limit by Mendeley Data Repository. Those three additional classes are uploaded in a separate dataset which can be accessed via (Reference below with DOI: 10.17632/w65pvb84dg.1). Three classes from the DOI (provided below) can be copied into this main folder in order to have the complete dataset with 5 distinct classes.

Khan, Bilal; Kelly, Nathan (2023), “Video Dataset of Sheep Activity (Standing and Walking)”, Mendeley Data, V1, doi: 10.17632/w65pvb84dg.1
Data from: An Annotated Video Dataset for Computing Video Memorability
figshare.com
zip
Updated Aug 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rukiye Savran Kiziltepe; Alan Smeaton; Lorin Sweeney; Mihai Gabriel Constantin; Faiyaz Doctor; Alba Garcia Seco de Herrera; Claire-Helene Demarty; Graham Healy; Bogdan Ionescu (2021). An Annotated Video Dataset for Computing Video Memorability [Dataset]. http://doi.org/10.6084/m9.figshare.15105867.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.15105867.v2
Dataset updated
Aug 5, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Rukiye Savran Kiziltepe; Alan Smeaton; Lorin Sweeney; Mihai Gabriel Constantin; Faiyaz Doctor; Alba Garcia Seco de Herrera; Claire-Helene Demarty; Graham Healy; Bogdan Ionescu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Media Memorability 2020 dataset contains a subset of short videos selected from the TRECVid 2019 Video-to-Text dataset. The dataset contains links to, as well as features describing and annotations on, 590videos as part of the training set and 410 videos as part of development set. It also contains links to, and features describing, 500 videos used as test videos for the MediaEval Video Memorability benchmark in 2020.
d
FileMarket | Dataset for Face Anti-Spoofing (Videos) in Computer Vision...
datarade.ai
Updated Jul 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FileMarket (2024). FileMarket | Dataset for Face Anti-Spoofing (Videos) in Computer Vision Applications | Machine Learning (ML) Data | Deep Learning (DL) Data [Dataset]. https://datarade.ai/data-products/filemarket-dataset-for-face-anti-spoofing-videos-in-compu-filemarket
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Jul 10, 2024
Dataset authored and provided by
FileMarket
Area covered
United Republic of, Cabo Verde, Libya, Russian Federation, South Sudan, Ukraine, Mauritania, Germany, Sao Tome and Principe, Guinea-Bissau
Description
Live Face Anti-Spoof Dataset

A live face dataset is crucial for advancing computer vision tasks such as face detection, anti-spoofing detection, and face recognition. The Live Face Anti-Spoof Dataset offered by Ainnotate is specifically designed to train algorithms for anti-spoofing purposes, ensuring that AI systems can accurately differentiate between real and fake faces in various scenarios.

Key Features:

Comprehensive Video Collection: The dataset features thousands of videos showcasing a diverse range of individuals, including males and females, with and without glasses. It also includes men with beards, mustaches, and clean-shaven faces. Lighting Conditions: Videos are captured in both indoor and outdoor environments, ensuring that the data covers a wide range of lighting conditions, making it highly applicable for real-world use. Data Collection Method: Our datasets are gathered through a community-driven approach, leveraging our extensive network of over 700k users across various Telegram apps. This method ensures that the data is not only diverse but also ethically sourced with full consent from participants, providing reliable and real-world applicable data for training AI models. Versatility: This dataset is ideal for training models in face detection, anti-spoofing, and face recognition tasks, offering robust support for these essential computer vision applications. In addition to the Live Face Anti-Spoof Dataset, FileMarket provides specialized datasets across various categories to support a wide range of AI and machine learning projects:

Object Detection Data: Perfect for training AI in image and video analysis. Machine Learning (ML) Data: Offers a broad spectrum of applications, from predictive analytics to natural language processing (NLP). Large Language Model (LLM) Data: Designed to support text generation, chatbots, and machine translation models. Deep Learning (DL) Data: Essential for developing complex neural networks and deep learning models. Biometric Data: Includes diverse datasets for facial recognition, fingerprint analysis, and other biometric applications. This live face dataset, alongside our other specialized data categories, empowers your AI projects by providing high-quality, diverse, and comprehensive datasets. Whether your focus is on anti-spoofing detection, face recognition, or other biometric and machine learning tasks, our data offerings are tailored to meet your specific needs.
r
Data from: CVB: A Video Dataset of Cattle Visual Behaviors
researchdata.edu.au
datadownload
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Flavio Pereira Alvarenga; Aaron Ingham; Lars Petersson; Brano Kusy; Vivien Rolland; Brendan Do; Neil Bagnall; Jody McNally; Greg Bishop-Hurley; Reza Arablouei; Ali Zia; Renuka Sharma (2023). CVB: A Video Dataset of Cattle Visual Behaviors [Dataset]. http://doi.org/10.25919/3G3T-P068
Explore at:
datadownloadAvailable download formats
Unique identifier
https://doi.org/10.25919/3G3T-P068
Dataset updated
Jun 13, 2023
Dataset provided by
Commonwealth Scientific and Industrial Research Organisation
Authors
Flavio Pereira Alvarenga; Aaron Ingham; Lars Petersson; Brano Kusy; Vivien Rolland; Brendan Do; Neil Bagnall; Jody McNally; Greg Bishop-Hurley; Reza Arablouei; Ali Zia; Renuka Sharma
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Time period covered
Aug 1, 2022 - Apr 28, 2023
Area covered

Description
Existing image/video datasets for cattle behavior recognition are mostly small, lack well-defined labels, or are collected in unrealistic controlled environments. This limits the utility of machine learning (ML) models learned from them. Therefore, we introduce a new dataset, called Cattle Visual Behaviors (CVB), that consists of 502 video clips, each fifteen seconds long, captured in natural lighting conditions, and annotated with eleven visually perceptible behaviors of grazing cattle. By creating and sharing CVB, our aim is to develop improved models capable of recognizing all important behaviors accurately and to assist other researchers and practitioners in developing and evaluating new ML models for cattle behavior classification using video data. The dataset is presented in the form of following three sub-directories. 1. raw_frames: contains 450 frames in each sub folder, representing 15 sec video, taking at a frames rate of 30 FPS, 2. annotations: contains the json files corresponding to the raw_frames folder. We have one json file for one video, containing the bounding box annotations for each cattle and their associated behaviors, and 3. CVB_in_AVA_format: contains the CVB data in the standard AVA dataset format which we have used to apply SlowFast model. Lineage: We use the Computer Vision Annotation Tool (CVAT) to collect our annotations. To make the procedure more efficient, we perform an initial detection and tracking of cattle in the videos using appropriate pre-trained models. The results are corrected by domain experts along with cattle behavior labeling in CVAT. The pre-hoc detection and tracking step significantly reduces the manual annotation time and effort.
g
Emotion Detection in Video Content
gts.ai
json
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2023). Emotion Detection in Video Content [Dataset]. https://gts.ai/case-study/emotion-detection-in-video-content/
Explore at:
jsonAvailable download formats
Dataset updated
Nov 20, 2023
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Explore the nuances of emotion detection in video content Technologies and methods that make emotion analysis more accurate and efficient.
spacejam - Action Recognition
kaggle.com
Updated Mar 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 9, 2021
Dataset provided by
Kaggle
Authors
antocommii
Description
Dataset

This dataset was created by antocommii

Contents
Data from: A comprehensive video dataset for Multi-Modal Recognition Systems...
zenodo.org
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anand Handa; Dr. Rashi Agarwal; Prof. Narendra Kohli; Anand Handa; Dr. Rashi Agarwal; Prof. Narendra Kohli (2020). A comprehensive video dataset for Multi-Modal Recognition Systems [Dataset]. http://doi.org/10.5281/zenodo.1492227
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1492227
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anand Handa; Dr. Rashi Agarwal; Prof. Narendra Kohli; Anand Handa; Dr. Rashi Agarwal; Prof. Narendra Kohli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A fully-labelled video dataset will act as a unique resource for researchers and analysts in the fields such as machine learning, computer vision and deep learning. The videos contain similar text recited by 67 different subjects. The text contains digits from 1 to 20 recited by 67 different subjects within the same experimental setup.
u
Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks...
rdr.ucl.ac.uk
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fernando Pérez-García; Catherine Scott; Rachel Sparks; Beate Diehl; Sebastien Ourselin (2023). Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures" [Dataset]. http://doi.org/10.5522/04/14781771.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5522/04/14781771.v1
Dataset updated
May 31, 2023
Dataset provided by
University College London
Authors
Fernando Pérez-García; Catherine Scott; Rachel Sparks; Beate Diehl; Sebastien Ourselin
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This is the dataset to support the paper:Fernando Pérez-García et al., 2021, Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures.The paper has been accepted for publication at the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021).A preprint is available on arXiv: https://arxiv.org/abs/2106.12014Contents:1) A CSV file "seizures.csv" with the following fields: - Subject: subject number - Seizure: seizure number - OnsetClonic: annotation marking the onset of the clonic phase - GTCS: whether the seizure generalises - Discard: whether one (Large, Small), none (No) or both (Yes) views were discarded for training.2) A folder "features_fpc_8_fps_15" containing two folders per seizure. The folders contain features extracted from all possible snippets from the small (S) and large (L) views. The snippets were 8 frames long and downsampled to 15 frames per second. The features are in ".pth" format and can be loaded using PyTorch: https://pytorch.org/docs/stable/generated/torch.load.html The last number of the file name indicates the frame index. For example, the file "006_01_L_000015.pth" corresponds to the features extracted from a snippet starting one second into the seizure video. Each file contains 512 numbers representing the deep features extracted from the corresponding snippet.3) A description file, "README.txt".
m
SDFVD: Small-scale Deepfake Forgery Video Dataset
data.mendeley.com
Updated Apr 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shilpa Kaman (2024). SDFVD: Small-scale Deepfake Forgery Video Dataset [Dataset]. http://doi.org/10.17632/bcmkfgct2s.1
Explore at:
Unique identifier
https://doi.org/10.17632/bcmkfgct2s.1
Dataset updated
Apr 23, 2024
Authors
Shilpa Kaman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Small-scale Deepfake Forgery Video Dataset (SDFVD) is a custom dataset consisting of real and deepfake videos with diverse contexts designed to study and benchmark deepfake detection algorithms. The dataset comprising of a total of 106 videos, with 53 original and 53 deepfake videos. Equal number of real and deepfake videos, ensures balance for machine learning model training and evaluation. The original videos were collected from Pexels: a well- known provider of stock photography and stock footage(video). These videos include a variety of backgrounds, and the subjects represent different genders and ages, reflecting a diverse range of scenarios. The input videos have been pre-processed by cropping them to a length of approximately 4 to 5 seconds and resizing them to 720p resolution, ensuring a consistent and uniform format across the dataset. Deepfake videos were generated using Remaker AI employing face-swapping techniques. Remaker AI is an AI-powered platform that can generate images, swap faces in photos and videos, and edit content. The source face photos for these swaps were taken from Freepik: is an image bank website provides contents such as photographs, illustrations and vector images. SDFVD was created due to the lack of availability of any such comparable small-scale deepfake video datasets. Key benefits of such datasets are: • In educational settings or smaller research labs, smaller datasets can be particularly useful as they require fewer resources, allowing students and researchers to conduct experiments with limited budgets and computational resources. • Researchers can use small-scale datasets to quickly prototype new ideas, test concepts, and refine algorithms before scaling up to larger datasets. Overall, SDFVD offers a compact but diverse collection of real and deepfake videos, suitable for a variety of applications, including research, security, and education. It serves as a valuable resource for exploring the rapidly evolving field of deepfake technology and its impact on society.
d
Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data...
datarade.ai
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data | Large Language Model(LLM) Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-image-and-video-description-data-1-pb-multimoda-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Jan 3, 2025
Dataset authored and provided by
Nexdata
Area covered
Mexico, Canada, United Arab Emirates, Malta, Ecuador, Israel, Czech Republic, Netherlands, Finland, Belgium
Description
Image Description Data Data Size: 500 million pairs Image Type: generic scene(portrait, landscapes, animals,etc), human action, picture book, magazine, PPT&chart, App screenshot, and etc. Resolution: 4K+ Description Language: English, Spanish, Portuguese, French, Korean, German, Chinese, Japanese Description Length: text length is no less than 250 words Format: the image format is .jpg, the annotation format is .json, and the description format is .txt

Video Description Data Data Size: 10 million pairs Image Type: generic scene(portrait, landscapes, animals,etc), ads, TV sports, documentaries Resolution: 1080p+ Description Language: English, Spanish, Portuguese, French, Korean, German, Chinese, Japanese Description Length: text length is no less than 250 words Format: .mp4,.mov,.avi and other common formats;.xlsx (annotation file format)

About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 3 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/llm?source=Datarade

Facebook

Twitter

Click to copy link

Link copied

Cite

Sharjeel M. (2023). Human Activity Recognition (HAR - Video Dataset) [Dataset]. http://doi.org/10.34740/kaggle/dsv/5722068

Human Activity Recognition (HAR - Video Dataset)

The dataset features 7 different classes of Human Activities in Videos.

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/5722068

Dataset updated

May 19, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Sharjeel M.

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

The dataset contains a comprehensive collection of human activity videos, spanning across 7 distinct classes. These classes include clapping, meeting and splitting, sitting, standing still, walking, walking while reading book, and walking while using the phone.

Each video clip in the dataset showcases a specific human activity and has been labeled with the corresponding class to facilitate supervised learning.

The primary inspiration behind creating this dataset is to enable machines to recognize and classify human activities accurately. With the advent of computer vision and deep learning techniques, it has become increasingly important to train machine learning models on large and diverse datasets to improve their accuracy and robustness.

Clear search

Close search

Google apps

Main menu

Human Activity Recognition (HAR - Video Dataset)

30M High-Quality Video Dataset – Copyright-Cleared & Commercial-Ready

Flood Amateur Video for Semantic Segmentation Dataset

Sintel 4D Light Field Video Dataset

THVD (Talking Head Video Dataset)

M-ART | Video Data | Global | 100,000 Stock videos | Including metadata and...

SAW-IT-Plus Video Dataset

Re-ID Data | 600,000 ID | CCTV Data |Computer Vision Data| Identity Data| AI...

Kinetics-400-[test-set]

Video Action Recognition : Kinetics 400

License

Annotated video footage for automated identification and counting of fish in...

Video Dataset of Sheep Activity (Grazing, Running, Sitting)

Data from: An Annotated Video Dataset for Computing Video Memorability

FileMarket | Dataset for Face Anti-Spoofing (Videos) in Computer Vision...

Data from: CVB: A Video Dataset of Cattle Visual Behaviors

Emotion Detection in Video Content

spacejam - Action Recognition

Dataset

Contents

Data from: A comprehensive video dataset for Multi-Modal Recognition Systems...

Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks...

SDFVD: Small-scale Deepfake Forgery Video Dataset

Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data...

Human Activity Recognition (HAR - Video Dataset)

The dataset features 7 different classes of Human Activities in Videos.