100+ datasets found

P
PortraitMode-400 Dataset
paperswithcode.com
Updated Apr 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mingfei Han; Linjie Yang; Xiaojie Jin; Jiashi Feng; Xiaojun Chang; Heng Wang (2024). PortraitMode-400 Dataset [Dataset]. https://paperswithcode.com/dataset/portraitmode-400
Explore at:
Dataset updated
Apr 9, 2024
Authors
Mingfei Han; Linjie Yang; Xiaojie Jin; Jiashi Feng; Xiaojun Chang; Heng Wang
Description
The PortraitMode-400 dataset is a significant contribution to the field of video recognition, specifically focusing on portrait mode videos. Let me provide you with more details:

Dataset Overview: The PortraitMode-400 (PM-400) dataset is the first of its kind and is dedicated to portrait mode video recognition. It was created to address the unique challenges associated with recognizing videos captured in portrait mode.

Portrait mode videos are increasingly important due to the growing popularity of smartphones and social media applications.

Data Collection and Annotation:

The dataset consists of 76,000 videos collected from Douyin, a popular short-video application. These videos were meticulously annotated with 400 fine-grained categories.

Rigorous quality assurance measures were implemented to ensure the accuracy of human annotations.

Research Insights and Impact:

The creators of the dataset conducted a comprehensive analysis to understand the impact of video format (portrait mode vs. landscape mode) on recognition accuracy. They also explored spatial bias arising from different video formats. Key aspects of portrait mode video recognition were investigated, including data augmentation, evaluation procedures, the importance of temporal information, and the role of audio modality.

(1) [2312.13746] Video Recognition in Portrait Mode - arXiv.org. https://arxiv.org/abs/2312.13746. (2) Video Recognition in Portrait Mode | Papers With Code. https://paperswithcode.com/paper/video-recognition-in-portrait-mode. (3) Video Recognition in Portrait Mode - arXiv.org. https://arxiv.org/pdf/2312.13746.pdf. (4) undefined. https://doi.org/10.48550/arXiv.2312.13746.
R
Dataset Video Dataset
universe.roboflow.com
zip
Updated Jul 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Learn Yolo v8 (2024). Dataset Video Dataset [Dataset]. https://universe.roboflow.com/learn-yolo-v8/dataset-video
Explore at:
zipAvailable download formats
Dataset updated
Jul 25, 2024
Dataset authored and provided by
Learn Yolo v8
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Senang Murung Bingung Normal Bounding Boxes
Description
Dataset Video

## Overview Dataset Video is a dataset for object detection tasks - it contains Senang Murung Bingung Normal annotations for 226 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
h
video-to-video-dataset
huggingface.co
Updated Feb 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yasunori Ozaki (2023). video-to-video-dataset [Dataset]. https://huggingface.co/datasets/alfredplpl/video-to-video-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 19, 2023
Authors
Yasunori Ozaki
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Video-to-Video Dataset

This is a dataset for video-to-video. You have not to worry about this copyright if you read the outline of license.

Outline of License

This is under Unity-Chan License. The outline is as follow:

You can use this for commercial purpose. You must display "Song/Motion: © Unity Technologies Japan/UCL." in your work.

The official guideline is here. Please read it.

Copyrights 3D Model

This model is CC-0. More

Song

Unity… See the full description on the dataset page: https://huggingface.co/datasets/alfredplpl/video-to-video-dataset.
h
Human-Motion-Video-for-Generative-Model
huggingface.co
Updated Jan 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
gulucaptain (2025). Human-Motion-Video-for-Generative-Model [Dataset]. https://huggingface.co/datasets/gulucaptain/Human-Motion-Video-for-Generative-Model
Explore at:
Dataset updated
Jan 6, 2025
Authors
gulucaptain
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Human Motion Video for Generative Model

🎏 Introduction

TL; DR: With the rapid developments in generative models, including the diffusion-based or the flow-based models, the human-centric tasks, like pose-driven human image animation, audio-driven action generation, diffusion-based pose estimation, human optical estimation, etc., have attracted a lot of attention from lots of works. In our recently works, we also pay attention to the quality of the training data of… See the full description on the dataset page: https://huggingface.co/datasets/gulucaptain/Human-Motion-Video-for-Generative-Model.
D
NeRF-4Scenes: A Video Dataset for Subjective Assessment of NeRF
dataverse.no
dataverse.azure.uit.no
pdf, zip
Updated Feb 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaira Tabassum; Shaira Tabassum; Seyed Ali Amirshahi; Seyed Ali Amirshahi (2025). NeRF-4Scenes: A Video Dataset for Subjective Assessment of NeRF [Dataset]. http://doi.org/10.18710/LFHFJN
Explore at:
pdf(4967680), zip(1541369856)Available download formats
Unique identifier
https://doi.org/10.18710/LFHFJN
Dataset updated
Feb 20, 2025
Dataset provided by
DataverseNO
Authors
Shaira Tabassum; Shaira Tabassum; Seyed Ali Amirshahi; Seyed Ali Amirshahi
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Norway, Gjøvik, Innlandet
Description
The dataset contains 36 NeRF-generated videos captured from four different indoor and outdoor environments: S1 for outdoor, S2 for auditorium, S3 for classroom, and S4 for lounge entrance. Each scene is trained using three NeRF models: Nerfacto as M1, Instant-NGP as M2, and Volinga as M3. Finally, each trained scene is rendered on three customized trajectories referred to as P1, P2, and P3. There are a total of 36 videos (4 scenes × 3 models × 3 paths) each having its own individual name. For example, video S1M1P1 corresponds to the outdoor scene (S1), which is trained on the Nerfacto model (M1), and rendered on the first camera path (P1).
R
Video Object Tracking Dataset
universe.roboflow.com
zip
Updated Sep 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AnamikaBoundry (2023). Video Object Tracking Dataset [Dataset]. https://universe.roboflow.com/anamikaboundry/video-object-tracking/model/1
Explore at:
zipAvailable download formats
Dataset updated
Sep 11, 2023
Dataset authored and provided by
AnamikaBoundry
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Boundary Bounding Boxes
Description
Video Object Tracking

## Overview Video Object Tracking is a dataset for object detection tasks - it contains Boundary annotations for 1,672 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
P
MedVidCL (Medical Video Classification) Dataset
paperswithcode.com
opendatalab.com
Updated May 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepak Gupta; Kush Attal; Dina Demner-Fushman (2022). MedVidCL (Medical Video Classification) Dataset [Dataset]. https://paperswithcode.com/dataset/medvidcl
Explore at:
Dataset updated
May 9, 2022
Authors
Deepak Gupta; Kush Attal; Dina Demner-Fushman
Description
The MedVidCL dataset contains a collection of 6, 617 videos annotated into ‘medical instructional’, ‘medical non-instructional' and ‘non-medical’ classes. A two-step approach is used to construct the MedVidCL dataset. In the first step, the videos annotated by health informatics experts are used to train a machine learning model that predicts the given video to one of the three aforementioned classes. In the second step, only the high-confidence videos are used and health informatics experts assess the model’s predicted video category and update the category wherever needed.
R
Video Anomaly Dataset
universe.roboflow.com
zip
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
final year project (2024). Video Anomaly Dataset [Dataset]. https://universe.roboflow.com/final-year-project-6zsph/video-anomaly/model/1
Explore at:
zipAvailable download formats
Dataset updated
Apr 2, 2024
Dataset authored and provided by
final year project
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Violance Bounding Boxes
Description
Video Anomaly

## Overview Video Anomaly is a dataset for object detection tasks - it contains Violance annotations for 2,132 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
u
Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks...
rdr.ucl.ac.uk
zip
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fernando Pérez-García; Catherine Scott; Rachel Sparks; Beate Diehl; Sebastien Ourselin (2023). Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures" [Dataset]. http://doi.org/10.5522/04/14781771.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5522/04/14781771.v1
Dataset updated
May 31, 2023
Dataset provided by
University College London
Authors
Fernando Pérez-García; Catherine Scott; Rachel Sparks; Beate Diehl; Sebastien Ourselin
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This is the dataset to support the paper:Fernando Pérez-García et al., 2021, Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures.The paper has been accepted for publication at the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021).A preprint is available on arXiv: https://arxiv.org/abs/2106.12014Contents:1) A CSV file "seizures.csv" with the following fields: - Subject: subject number - Seizure: seizure number - OnsetClonic: annotation marking the onset of the clonic phase - GTCS: whether the seizure generalises - Discard: whether one (Large, Small), none (No) or both (Yes) views were discarded for training.2) A folder "features_fpc_8_fps_15" containing two folders per seizure. The folders contain features extracted from all possible snippets from the small (S) and large (L) views. The snippets were 8 frames long and downsampled to 15 frames per second. The features are in ".pth" format and can be loaded using PyTorch: https://pytorch.org/docs/stable/generated/torch.load.html The last number of the file name indicates the frame index. For example, the file "006_01_L_000015.pth" corresponds to the features extracted from a snippet starting one second into the seizure video. Each file contains 512 numbers representing the deep features extracted from the corresponding snippet.3) A description file, "README.txt".
R
Visdrone Video Dataset
universe.roboflow.com
zip
Updated Aug 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fisheye Lens (2023). Visdrone Video Dataset [Dataset]. https://universe.roboflow.com/fisheye-lens/visdrone-video/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Aug 4, 2023
Dataset authored and provided by
Fisheye Lens
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Car People Pedestrians Van Motor Bounding Boxes
Description
VisDrone Video

## Overview VisDrone Video is a dataset for object detection tasks - it contains Car People Pedestrians Van Motor annotations for 6,275 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
T
Text-to-video Model Report
archivemarketresearch.com
doc, pdf, ppt
Updated Feb 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Text-to-video Model Report [Dataset]. https://www.archivemarketresearch.com/reports/text-to-video-model-23816
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Feb 13, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Market Overview: The global text-to-video model market is experiencing explosive growth, with a projected CAGR of 56.6% from 2023 to 2033. In 2025, the market was valued at $2,219 million and is anticipated to reach a staggering value of over $43,674 million by 2033. This growth is driven by advancements in artificial intelligence (AI) and machine learning (ML) technologies, which enable the generation of high-quality videos from textual descriptions. Key Trends and Drivers: The text-to-video model market is influenced by several key trends and drivers, including:

Rising demand for interactive and immersive content: Text-to-video models allow for the creation of engaging and visually appealing content that can captivate audiences and enhance user experiences. Advancements in generative AI: Ongoing advancements in generative AI, such as deep learning and transformer models, are enabling text-to-video models to generate increasingly realistic and detailed videos. Growing adoption in entertainment and media: The entertainment and media industry has been a major driver of text-to-video model adoption, as these models can be used to create trailers, commercials, and even entire films.

Segments and Regional Analysis: The text-to-video model market can be segmented by type (below 3 billion parameters and above 3 billion parameters), application (entertainment and media, film and television, cartoon, education, and others), and region (North America, South America, Europe, Middle East & Africa, and Asia Pacific). North America currently holds the largest market share due to the early adoption of AI technologies and the presence of key players like OpenAI and Meta. However, Asia Pacific is expected to witness significant growth over the forecast period, driven by the rapidly growing entertainment and media industries in China and India. This comprehensive report provides in-depth insights into the thriving text-to-video model market, valued at approximately $20 billion in 2023.
f
Dataset-I-drinking-related-object-detection (in both YoloV8 and COCO format)...
kcl.figshare.com
Updated Feb 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Chen; Xinqi Bao; Ernest Kamavuako (2025). Dataset-I-drinking-related-object-detection (in both YoloV8 and COCO format) [Dataset]. http://doi.org/10.18742/26337085.v1
Explore at:
Unique identifier
https://doi.org/10.18742/26337085.v1
Dataset updated
Feb 27, 2025
Dataset provided by
King's College London
Authors
Xin Chen; Xinqi Bao; Ernest Kamavuako
License
https://www.kcl.ac.uk/researchsupport/assets/DataAccessAgreement-Description.pdfhttps://www.kcl.ac.uk/researchsupport/assets/DataAccessAgreement-Description.pdf
Description
This dataset contains annotated images for object detection for containers and hands in a first-person view (egocentric view) during drinking activities. Both YOLOV8 format and COCO format are provided.Please refer to our paper for more details.Purpose: Training and testing the object detection model.Content: Videos from Session 1 of Subjects 1-20.Images: Extracted from the videos of Subjects 1-20 Session 1.Additional Images:~500 hand/container images from Roboflow Open Source data.~1500 null (background) images from VOC Dataset and MIT Indoor Scene Recognition Dataset:1000 indoor scenes from 'MIT Indoor Scene Recognition'400 other unrelated objects from VOC DatasetData Augmentation:Horizontal flipping±15% brightness change±10° rotationFormats Provided:COCO formatPyTorch YOLOV8 formatImage Size: 416x416 pixelsTotal Images: 16,834Training: 13,862Validation: 1,975Testing: 997Instance Numbers:Containers: Over 10,000Hands: Over 8,000
Z
Replay-Attack
data.niaid.nih.gov
zenodo.org
Updated Mar 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcel, Sébastien (2023). Replay-Attack [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4593128
Explore at:
Dataset updated
Mar 6, 2023
Dataset provided by
Marcel, Sébastien
Anjos, André
Chingovska, Ivana
Description
Replay-Attack is a dataset for face recognition and presentation attack detection (anti-spoofing). The dataset consists of 1300 video clips of photo and video presentation attack (spoofing attacks) to 50 clients, under different lighting conditions.

Spoofing Attacks Description

The 2D face spoofing attack database consists of 1,300 video clips of photo and video attack attempts of 50 clients, under different lighting conditions.

The data is split into 4 sub-groups comprising:

Training data ("train"), to be used for training your anti-spoof classifier;

Development data ("devel"), to be used for threshold estimation;

Test data ("test"), with which to report error figures;

Enrollment data ("enroll"), that can be used to verify spoofing sensitivity on face detection algorithms.

Clients that appear in one of the data sets (train, devel or test) do not appear in any other set.

Database Description

All videos are generated by either having a (real) client trying to access a laptop through a built-in webcam or by displaying a photo or a video recording of the same client for at least 9 seconds. The webcam produces colour videos with a resolution of 320 pixels (width) by 240 pixels (height). The movies were recorded on a Macbook laptop using the QuickTime framework (codec: Motion JPEG) and saved into ".mov" files. The frame rate is about 25 Hz. Besides the native support on Apple computers, these files are easily readable using mplayer, ffmpeg or any other video utilities available under Linux or MS Windows systems.

Real client accesses as well as data collected for the attacks are taken under two different lighting conditions:

controlled: The office light was turned on, blinds are down, background is homogeneous;

adverse: Blinds up, more complex background, office lights are out.

To produce the attacks, high-resolution photos and videos from each client were taken under the same conditions as in their authentication sessions, using a Canon PowerShot SX150 IS camera, which records both 12.1 Mpixel photographs and 720p high-definition video clips. The way to perform the attacks can be divided into two subsets: the first subset is composed of videos generated using a stand to hold the client biometry ("fixed"). For the second set, the attacker holds the device used for the attack with their own hands. In total, 20 attack videos were registered for each client, 10 for each of the attacking modes just described:

4 x mobile attacks using an iPhone 3GS screen (with resolution 480x320 pixels) displaying:

1 x mobile photo/controlled

1 x mobile photo/adverse

1 x mobile video/controlled

1 x mobile video/adverse

4 x high-resolution screen attacks using an iPad (first generation, with a screen resolution of 1024x768 pixels) displaying:

1 x high-resolution photo/controlled

1 x high-resolution photo/adverse

1 x high-resolution video/controlled

1 x high-resolution video/adverse

2 x hard-copy print attacks (produced on a Triumph-Adler DCC 2520 color laser printer) occupying the whole available printing surface on A4 paper for the following samples:

1 x high-resolution print of photo/controlled

1 x high-resolution print of photo/adverse

The 1300 real-accesses and attacks videos were then divided in the following way:

Training set: contains 60 real-accesses and 300 attacks under different lighting conditions;

Development set: contains 60 real-accesses and 300 attacks under different lighting conditions;

Test set: contains 80 real-accesses and 400 attacks under different lighting conditions;

Face Locations

We also provide face locations automatically annotated by a cascade of classifiers based on a variant of Local Binary Patterns (LBP) referred as Modified Census Transform (MCT) [Face Detection with the Modified Census Transform, Froba, B. and Ernst, A., 2004, IEEE International Conference on Automatic Face and Gesture Recognition, pp. 91-96]. The automatic face localisation procedure works in more than 99% of the total number of frames acquired. This means that less than 1% of the total set of frames for all videos do not possess annotated faces. User algorithms must account for this fact.

Protocol for Licit Biometric Transactions

It is possible to measure the performance of baseline face recognition systems on the 2D Face spoofing database and evaluate how well the attacks pass such systems or how, otherwise robust they are to attacks. Here we describe how to use the available data at the enrolment set to create a background model, client models and how to perform scoring using the available data.

Universal Background Model (UBM): To generate the UBM, subselect the training-set client videos from the enrollment videos. There should be 2 per client, which means you get 30 videos, each with 375 frames to create the model;

Client models: To generate client models, use the enrollment data for clients at the development and test groups. There should be 2 videos per client (one for each light condition) once more. At the end of the enrollment procedure, the development set must have 1 model for each of the 15 clients available in that set. Similarly, for the test set, 1 model for each of the 20 clients available;

For a simple baseline verification, generate scores exhaustively for all videos from the development and test real-accesses respectively, but without intermixing accross development and test sets. The scores generated against matched client videos and models (within the subset, i.e. development or test) should be considered true client accesses, while all others impostors;

If you are looking for a single number to report on the performance do the following: exclusively using the scores from the development set, tune your baseline face recognition system on the EER of the development set and use this threshold to find the HTER on the test set scores.

Protocols for Spoofing Attacks

Attack protocols are used to evaluate the (binary classification) performance of counter-measures to spoof attacks. The database can be split into 6 different protocols according to the type of device used to generate the attack: print, mobile (phone), high-definition (tablet), photo, video or grand test (all types). Furthermore, subsetting can be achieved on the top of the previous 6 groups by classifying attacks as performed by the attacker bare hands or using a fixed support. This classification scheme makes-up a total of 18 protocols that can be used for studying the performance of counter-measures to 2D face spoofing attacks. The table bellow details the amount of video clips in each protocol.

Acknowledgements

If you use this database, please cite the following publication:

I. Chingovska, A. Anjos, S. Marcel,"On the Effectiveness of Local Binary Patterns in Face Anti-spoofing"; IEEE BIOSIG, 2012. https://ieeexplore.ieee.org/document/6313548 http://publications.idiap.ch/index.php/publications/show/2447
h
hispanic-people-liveness-detection-video-dataset
huggingface.co
Updated Apr 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2024). hispanic-people-liveness-detection-video-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/hispanic-people-liveness-detection-video-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 24, 2024
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Biometric Attack Dataset, Hispanic People

The similar dataset that includes all ethnicities - Anti Spoofing Real Dataset

The dataset for face anti spoofing and face recognition includes images and videos of hispanic people. 32,600+ photos & video of 16,300 people from 20 countries. The dataset helps in enchancing the performance of the model by providing wider range of data for a specific ethnic group. The videos were gathered by capturing faces of genuine individuals… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/hispanic-people-liveness-detection-video-dataset.
r
Data from: CVB: A Video Dataset of Cattle Visual Behaviors
researchdata.edu.au
data.csiro.au
datadownload
Updated Jun 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Flavio Pereira Alvarenga; Aaron Ingham; Lars Petersson; Brano Kusy; Vivien Rolland; Brendan Do; Neil Bagnall; Jody McNally; Greg Bishop-Hurley; Reza Arablouei; Ali Zia; Renuka Sharma (2023). CVB: A Video Dataset of Cattle Visual Behaviors [Dataset]. http://doi.org/10.25919/3G3T-P068
Explore at:
datadownloadAvailable download formats
Unique identifier
https://doi.org/10.25919/3G3T-P068
Dataset updated
Jun 13, 2023
Dataset provided by
Commonwealth Scientific and Industrial Research Organisation
Authors
Flavio Pereira Alvarenga; Aaron Ingham; Lars Petersson; Brano Kusy; Vivien Rolland; Brendan Do; Neil Bagnall; Jody McNally; Greg Bishop-Hurley; Reza Arablouei; Ali Zia; Renuka Sharma
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Time period covered
Aug 1, 2022 - Apr 28, 2023
Area covered

Description
Existing image/video datasets for cattle behavior recognition are mostly small, lack well-defined labels, or are collected in unrealistic controlled environments. This limits the utility of machine learning (ML) models learned from them. Therefore, we introduce a new dataset, called Cattle Visual Behaviors (CVB), that consists of 502 video clips, each fifteen seconds long, captured in natural lighting conditions, and annotated with eleven visually perceptible behaviors of grazing cattle. By creating and sharing CVB, our aim is to develop improved models capable of recognizing all important behaviors accurately and to assist other researchers and practitioners in developing and evaluating new ML models for cattle behavior classification using video data. The dataset is presented in the form of following three sub-directories. 1. raw_frames: contains 450 frames in each sub folder, representing 15 sec video, taking at a frames rate of 30 FPS, 2. annotations: contains the json files corresponding to the raw_frames folder. We have one json file for one video, containing the bounding box annotations for each cattle and their associated behaviors, and 3. CVB_in_AVA_format: contains the CVB data in the standard AVA dataset format which we have used to apply SlowFast model. Lineage: We use the Computer Vision Annotation Tool (CVAT) to collect our annotations. To make the procedure more efficient, we perform an initial detection and tracking of cattle in the videos using appropriate pre-trained models. The results are corrected by domain experts along with cattle behavior labeling in CVAT. The pre-hoc detection and tracking step significantly reduces the manual annotation time and effort.
sleep dataset.zip
figshare.com
osapublishing.org
zip
Updated Jul 15, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Menghan Hu (2019). sleep dataset.zip [Dataset]. http://doi.org/10.6084/m9.figshare.5518996.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5518996.v2
Dataset updated
Jul 15, 2019
Dataset provided by
Figsharehttp://figshare.com/
Authors
Menghan Hu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a dual-mode sleep video database. Please cite the following paper if you wish to use our dataset: Hu M, Zhai G, Li D, et al. Combination of near-infrared and thermal imaging techniques for the remote and simultaneous measurements of breathing and heart rates under sleep situation[J]. PloS one, 2018, 13(1): e0190466.If you have any questions, you can send a request to: humenghan89@163.com

Global Ai Powered Video Generator Market Research Report: By Deployment Mode...

wiseguyreports.com

Updated Jul 18, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

wWiseguy Research Consultants Pvt Ltd (2024). Global Ai Powered Video Generator Market Research Report: By Deployment Mode (Cloud-based, On-premises), By Video Type (Marketing Videos, Product Videos, Training Videos, Social Media Videos), By Vertical (Education, Healthcare, Retail, E-commerce, Media and Entertainment), By End-User (Small and Medium Enterprises (SMEs), Large Enterprises, Content Creators, Marketing Agencies) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/reports/ai-powered-video-generator-market

Explore at:

Dataset updated

Jul 18, 2024

Dataset authored and provided by

wWiseguy Research Consultants Pvt Ltd

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Jan 7, 2024

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2024
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2023	0.04(USD Billion)
MARKET SIZE 2024	0.05(USD Billion)
MARKET SIZE 2032	0.5(USD Billion)
SEGMENTS COVERED	Deployment Mode ,Video Type ,Vertical ,End-User ,Regional
COUNTRIES COVERED	North America, Europe, APAC, South America, MEA
KEY MARKET DYNAMICS	1 Rising Demand for Personalized Content 2 Increase in Video Marketing Adoption 3 Growing Popularity of AIPowered Solutions 4 Emergence of CloudBased Platforms 5 Integration with Social Media Platforms
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	Invideo.io ,Magisto ,Canva ,Videoa ,Kapwing ,Lumen5 ,VEED.IO ,Kizoa ,Synthesia ,Flixier ,Biteable ,Runway ML ,Animaker ,Adobe Spark ,Hippo Video
MARKET FORECAST PERIOD	2024 - 2032
KEY MARKET OPPORTUNITIES	Content creation automation Realtime video generation Personalized marketing videos Video production cost reduction Accessibility for nonvideo experts
COMPOUND ANNUAL GROWTH RATE (CAGR)	33.23% (2024 - 2032)

VRiV (Vehicle Recognition in Videos) Dataset
kaggle.com
zip
Updated Dec 5, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Landry KEZEBOU (2021). VRiV (Vehicle Recognition in Videos) Dataset [Dataset]. https://www.kaggle.com/landrykezebou/vriv-vehicle-recognition-in-videos-dataset
Explore at:
zip(2383870377 bytes)Available download formats
Dataset updated
Dec 5, 2021
Authors
Landry KEZEBOU
Description
Context

The concept of searching and localizing vehicles from live traffic videos based on descriptive textual input has yet to be explored in the scholarly literature. Endowing Intelligent Transportation Systems (ITS) with such a capability could help solve crimes on roadways. While artificial intelligence (AI) can be a powerful tool for this data intensive application, existing state-of-the-art AI models struggle with fine-grain vehicle recognition. Typically, only reporting model performance on still input image data, often captured at high resolution and at pristine quality. These settings are not reflective of real-world operating conditions and thus, recognition accuracies typically cannot be replicated on video data. One major impediment to the advancement of fine-grain vehicle recognition models is the lack of video testbench datasets with annotated ground-truth data. Additionally, to the best of our knowledge, no metrics currently exist for evaluating the robustness and performance efficiency of a vehicle recognition model on live videos, and even less so for vehicle search and localization models. In this paper, we address these challenges by proposing V-Localize, a novel artificial intelligence framework for vehicle search and continuous localization captured from live traffic videos based on input textual descriptions. An efficient hashgraph algorithm is introduced to process input text (such as a sentence, paragraph, or report) to extract detailed target information used to query the recognition and localization model. This work further introduces two novel datasets that will help advance AI research in these challenging areas. These datasets include: a) the most diverse and large-scale Vehicle Color Recognition (VCoR) dataset with 15 colors classes -- twice as many as the number of color classes in the largest existing such dataset -- to facilitate finer-grain recognition with color information; and b) a Vehicle Recognition in Video (VRiV) dataset, which is a first of its kind video test-bench dataset for evaluating the performance of vehicle recognition models in live videos rather than still image data. The VRiV dataset will open new avenues for AI researchers to investigate innovative approaches that were previously intractable due to the lack of a traffic vehicle recognition annotated test-bench video dataset. Finally, to address the gap in the field, 5 novel metrics are introduced in this paper for adequately accessing the performance of vehicle recognition models in live videos. Ultimately, the proposed metrics could also prove intuitively effective at quantitative model evaluation in other video recognition applications. The novel metrics and VRiV test-bench dataset introduced in this paper are specifically aimed at advancing state-of-the-art research for vehicle recognition in videos. Likewise, the proposed novel vehicle search and continuous localization framework could prove assistive in cases such as of amber alerts or hit-and-run incidents. One major advantage of the proposed system is that it can be integrated into intelligent transportation system software to help aid law-enforcement.

Image Acquisition

The proposed Vehicle Recognition in Video (VRiV) dataset is the first of its kind and is aimed at developing, improving, and analyzing performance of vehicle search and recognition models on live videos. The lack of such a dataset has limited performance analysis of modern fine-grain vehicle recognition systems to only still image input data, making them less suitable for video applications. The VRiV dataset is introduced to help bridge this gap and foster research in this direction. The proposed VRiV dataset consists of up to 47 video sequences averaging about 38.5 seconds per video. The videos are recorded in a traffic setting focusing on vehicles of volunteer candidates whose ground truth make, model, year and color information are known. For security reasons and safety of participants, experiments are conducted on streets/road with low traffic density. For each video, there is a target vehicle with known ground truth information, and there are other vehicles either moving in traffic or parked on side streets, to simulate real-world traffic scenario. The goal is for the algorithm to be able to search, recognize and continuously localize just the specific target vehicle of interest for the corresponding video based on the search query. It is worth noting that the ground truth information about other vehicles in the videos are not known. The 47 videos in the testbench dataset are distributed across 7 distinct makes and 17 model designs as shown in Figure 10. The videos are also annotated to include ground truth bounding boxes for the specific target vehicles in corresponding videos. The dataset includes more than 46k annotated frames averaging about 920 frames per video. This dataset will be made available on Kaggle, and new videos will be added as they become available.

Content

There is one main zip file available for download. The zip file contains 94 files. 1) 47 video files 2) 47 ground-truth annotated files which identifies locations where the vehicle of interest is in the frame. Each video file is labelled with the corresponding vehicle brand name, model, year, and color information.

Terms and Conditions

Videos provided in this dataset are freely available for research and education purposes only. Please be sure to properly credit the authors by citing the article below.

Be sure to upvote this dataset if you find it useful by scrolling up and clicking the ^ sign at the top-right corner of the cover image of this page.

Be sure to blur out all plate numbers before publishing any of the contents available in this dataset.

Acknowledgements

Any publication using this database must reference to the following journal manuscript:

Panetta, Karen, Landry Kezebou, Victor Oludare, James Intriligator, and Sos Agaian. 2021. "Artificial Intelligence for Text-Based Vehicle Search, Recognition, and Continuous Localization in Traffic Videos" AI 2, no. 4: 684-704. https://doi.org/10.3390/ai2040041

open access : https://www.mdpi.com/2673-2688/2/4/41

Note: if the link is broken, please use http instead of https.

In Chrome, use the steps recommended in the following website to view the webpage if it appears to be broken https://www.technipages.com/chrome-enabledisable-not-secure-warning

VCoR dataset: https://www.kaggle.com/landrykezebou/vcor-vehicle-color-recognition-dataset VRiV dataset: https://www.kaggle.com/landrykezebou/vriv-vehicle-recognition-in-videos-dataset

For any enquires regarding the VCoR dataset, contact: landrykezebou@gmail.com
f
Eight machine learning classifiers used for video analysis and autism...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qandeel Tariq; Jena Daniels; Jessey Nicole Schwartz; Peter Washington; Haik Kalantarian; Dennis Paul Wall (2023). Eight machine learning classifiers used for video analysis and autism detection. [Dataset]. http://doi.org/10.1371/journal.pmed.1002705.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pmed.1002705.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS Medicine
Authors
Qandeel Tariq; Jena Daniels; Jessey Nicole Schwartz; Peter Washington; Haik Kalantarian; Dennis Paul Wall
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The models were constructed from an analysis of archived medical records from the use of standard instruments, including the ADOS and the ADI-R. All 8 models identified a small, stable subset of features in cross-validation experiments. The total numbers of affected and unaffected control participants for training and testing are provided together with measures of accuracy on the test set. Four models were tested on independent datasets and have been mentioned in a separate “Test” category. The remaining 4, indicated with “Train/test,” used the given dataset with an 80%:20% train:test split to calculate test accuracy on the 20% held-out test set. The naming convention of the classifiers is “model type”-“number of features”.
f
Data from: A game-theoretic analysis for creative mid-roll ads on a content...
tandf.figshare.com
text/x-tex
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaofu Du; Zaichen Luo; Li Hu (2025). A game-theoretic analysis for creative mid-roll ads on a content platform [Dataset]. http://doi.org/10.6084/m9.figshare.28016248.v1
Explore at:
text/x-texAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28016248.v1
Dataset updated
Jun 4, 2025
Dataset provided by
Taylor & Francis
Authors
Shaofu Du; Zaichen Luo; Li Hu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The accessibility of pre-roll ads (PAs) has significantly diminished due to the privilege of skipping ads for paid members. Leading video platforms have introduced a new type of streaming ad, the creative mid-roll ad (CMA), which reduces consumer aversion to ads by tightly integrating them with video. Moreover, paid members are also a target group for CMAs, which may conflict with their interests and potentially reduce their viewing volume. We develop a game-theoretical model to examine the platform's advertising pricing and the advertiser's mode selection process. Counterintuitively, the CMA fee rate at equilibrium decreases as video attractiveness and the CMA's ad conversion ability increase because higher video attractiveness and ad conversion ability enhance the advantages of lowering the CMA fee rate. We find that the advertiser will balance conversion efficiency and price gaps between the two ad modes when making decisions. As increased conversion ability prompts advertisers to invest more, a significant gap in conversion efficiency results in higher equilibrium investment in PA mode compared to CMA mode. A low proportion of paid members results in a better consumer surplus in CMA mode than in PA mode, which is because the CMA mode targets paid members with its advertisements.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mingfei Han; Linjie Yang; Xiaojie Jin; Jiashi Feng; Xiaojun Chang; Heng Wang (2024). PortraitMode-400 Dataset [Dataset]. https://paperswithcode.com/dataset/portraitmode-400

PortraitMode-400 Dataset

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Apr 9, 2024

Authors

Mingfei Han; Linjie Yang; Xiaojie Jin; Jiashi Feng; Xiaojun Chang; Heng Wang

Description

The PortraitMode-400 dataset is a significant contribution to the field of video recognition, specifically focusing on portrait mode videos. Let me provide you with more details:

Dataset Overview: The PortraitMode-400 (PM-400) dataset is the first of its kind and is dedicated to portrait mode video recognition. It was created to address the unique challenges associated with recognizing videos captured in portrait mode.

Portrait mode videos are increasingly important due to the growing popularity of smartphones and social media applications.

Data Collection and Annotation:

The dataset consists of 76,000 videos collected from Douyin, a popular short-video application. These videos were meticulously annotated with 400 fine-grained categories.

Rigorous quality assurance measures were implemented to ensure the accuracy of human annotations.

Research Insights and Impact:

The creators of the dataset conducted a comprehensive analysis to understand the impact of video format (portrait mode vs. landscape mode) on recognition accuracy. They also explored spatial bias arising from different video formats. Key aspects of portrait mode video recognition were investigated, including data augmentation, evaluation procedures, the importance of temporal information, and the role of audio modality.

(1) [2312.13746] Video Recognition in Portrait Mode - arXiv.org. https://arxiv.org/abs/2312.13746. (2) Video Recognition in Portrait Mode | Papers With Code. https://paperswithcode.com/paper/video-recognition-in-portrait-mode. (3) Video Recognition in Portrait Mode - arXiv.org. https://arxiv.org/pdf/2312.13746.pdf. (4) undefined. https://doi.org/10.48550/arXiv.2312.13746.

Clear search

Close search

Google apps

Main menu

PortraitMode-400 Dataset

Dataset Video Dataset

Dataset Video

video-to-video-dataset

Human-Motion-Video-for-Generative-Model

NeRF-4Scenes: A Video Dataset for Subjective Assessment of NeRF

Video Object Tracking Dataset

Video Object Tracking

MedVidCL (Medical Video Classification) Dataset

Video Anomaly Dataset

Video Anomaly

Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks...

Visdrone Video Dataset

VisDrone Video

Text-to-video Model Report

Dataset-I-drinking-related-object-detection (in both YoloV8 and COCO format)...

Replay-Attack

hispanic-people-liveness-detection-video-dataset

Data from: CVB: A Video Dataset of Cattle Visual Behaviors

sleep dataset.zip

Global Ai Powered Video Generator Market Research Report: By Deployment Mode...

VRiV (Vehicle Recognition in Videos) Dataset

Context

Image Acquisition

Content

Terms and Conditions

Acknowledgements

Eight machine learning classifiers used for video analysis and autism...

Data from: A game-theoretic analysis for creative mid-roll ads on a content...

PortraitMode-400 Dataset