100+ datasets found
  1. P

    PortraitMode-400 Dataset

    • paperswithcode.com
    Updated Apr 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingfei Han; Linjie Yang; Xiaojie Jin; Jiashi Feng; Xiaojun Chang; Heng Wang (2024). PortraitMode-400 Dataset [Dataset]. https://paperswithcode.com/dataset/portraitmode-400
    Explore at:
    Dataset updated
    Apr 9, 2024
    Authors
    Mingfei Han; Linjie Yang; Xiaojie Jin; Jiashi Feng; Xiaojun Chang; Heng Wang
    Description

    The PortraitMode-400 dataset is a significant contribution to the field of video recognition, specifically focusing on portrait mode videos. Let me provide you with more details:

    Dataset Overview: The PortraitMode-400 (PM-400) dataset is the first of its kind and is dedicated to portrait mode video recognition. It was created to address the unique challenges associated with recognizing videos captured in portrait mode.

    Portrait mode videos are increasingly important due to the growing popularity of smartphones and social media applications.

    Data Collection and Annotation:

    The dataset consists of 76,000 videos collected from Douyin, a popular short-video application. These videos were meticulously annotated with 400 fine-grained categories.

    Rigorous quality assurance measures were implemented to ensure the accuracy of human annotations.

    Research Insights and Impact:

    The creators of the dataset conducted a comprehensive analysis to understand the impact of video format (portrait mode vs. landscape mode) on recognition accuracy. They also explored spatial bias arising from different video formats. Key aspects of portrait mode video recognition were investigated, including data augmentation, evaluation procedures, the importance of temporal information, and the role of audio modality.

    (1) [2312.13746] Video Recognition in Portrait Mode - arXiv.org. https://arxiv.org/abs/2312.13746. (2) Video Recognition in Portrait Mode | Papers With Code. https://paperswithcode.com/paper/video-recognition-in-portrait-mode. (3) Video Recognition in Portrait Mode - arXiv.org. https://arxiv.org/pdf/2312.13746.pdf. (4) undefined. https://doi.org/10.48550/arXiv.2312.13746.

  2. R

    Dataset Video Dataset

    • universe.roboflow.com
    zip
    Updated Jul 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Learn Yolo v8 (2024). Dataset Video Dataset [Dataset]. https://universe.roboflow.com/learn-yolo-v8/dataset-video
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 25, 2024
    Dataset authored and provided by
    Learn Yolo v8
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Senang Murung Bingung Normal Bounding Boxes
    Description

    Dataset Video

    ## Overview
    
    Dataset Video is a dataset for object detection tasks - it contains Senang Murung Bingung Normal annotations for 226 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  3. h

    video-to-video-dataset

    • huggingface.co
    Updated Feb 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasunori Ozaki (2023). video-to-video-dataset [Dataset]. https://huggingface.co/datasets/alfredplpl/video-to-video-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 19, 2023
    Authors
    Yasunori Ozaki
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Video-to-Video Dataset

    This is a dataset for video-to-video. You have not to worry about this copyright if you read the outline of license.

      Outline of License
    

    This is under Unity-Chan License. The outline is as follow:

    You can use this for commercial purpose. You must display "Song/Motion: © Unity Technologies Japan/UCL." in your work.

    The official guideline is here. Please read it.

      Copyrights
    
    
    
    
    
      3D Model
    

    This model is CC-0. More

      Song
    

    Unity… See the full description on the dataset page: https://huggingface.co/datasets/alfredplpl/video-to-video-dataset.

  4. h

    Human-Motion-Video-for-Generative-Model

    • huggingface.co
    Updated Jan 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    gulucaptain (2025). Human-Motion-Video-for-Generative-Model [Dataset]. https://huggingface.co/datasets/gulucaptain/Human-Motion-Video-for-Generative-Model
    Explore at:
    Dataset updated
    Jan 6, 2025
    Authors
    gulucaptain
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Human Motion Video for Generative Model

      🎏 Introduction
    

    TL; DR: With the rapid developments in generative models, including the diffusion-based or the flow-based models, the human-centric tasks, like pose-driven human image animation, audio-driven action generation, diffusion-based pose estimation, human optical estimation, etc., have attracted a lot of attention from lots of works. In our recently works, we also pay attention to the quality of the training data of… See the full description on the dataset page: https://huggingface.co/datasets/gulucaptain/Human-Motion-Video-for-Generative-Model.

  5. D

    NeRF-4Scenes: A Video Dataset for Subjective Assessment of NeRF

    • dataverse.no
    • dataverse.azure.uit.no
    pdf, zip
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaira Tabassum; Shaira Tabassum; Seyed Ali Amirshahi; Seyed Ali Amirshahi (2025). NeRF-4Scenes: A Video Dataset for Subjective Assessment of NeRF [Dataset]. http://doi.org/10.18710/LFHFJN
    Explore at:
    pdf(4967680), zip(1541369856)Available download formats
    Dataset updated
    Feb 20, 2025
    Dataset provided by
    DataverseNO
    Authors
    Shaira Tabassum; Shaira Tabassum; Seyed Ali Amirshahi; Seyed Ali Amirshahi
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Norway, Gjøvik, Innlandet
    Description

    The dataset contains 36 NeRF-generated videos captured from four different indoor and outdoor environments: S1 for outdoor, S2 for auditorium, S3 for classroom, and S4 for lounge entrance. Each scene is trained using three NeRF models: Nerfacto as M1, Instant-NGP as M2, and Volinga as M3. Finally, each trained scene is rendered on three customized trajectories referred to as P1, P2, and P3. There are a total of 36 videos (4 scenes × 3 models × 3 paths) each having its own individual name. For example, video S1M1P1 corresponds to the outdoor scene (S1), which is trained on the Nerfacto model (M1), and rendered on the first camera path (P1).

  6. R

    Video Object Tracking Dataset

    • universe.roboflow.com
    zip
    Updated Sep 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AnamikaBoundry (2023). Video Object Tracking Dataset [Dataset]. https://universe.roboflow.com/anamikaboundry/video-object-tracking/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 11, 2023
    Dataset authored and provided by
    AnamikaBoundry
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Boundary Bounding Boxes
    Description

    Video Object Tracking

    ## Overview
    
    Video Object Tracking is a dataset for object detection tasks - it contains Boundary annotations for 1,672 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  7. P

    MedVidCL (Medical Video Classification) Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated May 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepak Gupta; Kush Attal; Dina Demner-Fushman (2022). MedVidCL (Medical Video Classification) Dataset [Dataset]. https://paperswithcode.com/dataset/medvidcl
    Explore at:
    Dataset updated
    May 9, 2022
    Authors
    Deepak Gupta; Kush Attal; Dina Demner-Fushman
    Description

    The MedVidCL dataset contains a collection of 6, 617 videos annotated into ‘medical instructional’, ‘medical non-instructional' and ‘non-medical’ classes. A two-step approach is used to construct the MedVidCL dataset. In the first step, the videos annotated by health informatics experts are used to train a machine learning model that predicts the given video to one of the three aforementioned classes. In the second step, only the high-confidence videos are used and health informatics experts assess the model’s predicted video category and update the category wherever needed.

  8. R

    Video Anomaly Dataset

    • universe.roboflow.com
    zip
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    final year project (2024). Video Anomaly Dataset [Dataset]. https://universe.roboflow.com/final-year-project-6zsph/video-anomaly/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 2, 2024
    Dataset authored and provided by
    final year project
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Violance Bounding Boxes
    Description

    Video Anomaly

    ## Overview
    
    Video Anomaly is a dataset for object detection tasks - it contains Violance annotations for 2,132 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  9. u

    Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks...

    • rdr.ucl.ac.uk
    zip
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernando Pérez-García; Catherine Scott; Rachel Sparks; Beate Diehl; Sebastien Ourselin (2023). Data to support the paper "Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures" [Dataset]. http://doi.org/10.5522/04/14781771.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    University College London
    Authors
    Fernando Pérez-García; Catherine Scott; Rachel Sparks; Beate Diehl; Sebastien Ourselin
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This is the dataset to support the paper:Fernando Pérez-García et al., 2021, Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures.The paper has been accepted for publication at the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021).A preprint is available on arXiv: https://arxiv.org/abs/2106.12014Contents:1) A CSV file "seizures.csv" with the following fields: - Subject: subject number - Seizure: seizure number - OnsetClonic: annotation marking the onset of the clonic phase - GTCS: whether the seizure generalises - Discard: whether one (Large, Small), none (No) or both (Yes) views were discarded for training.2) A folder "features_fpc_8_fps_15" containing two folders per seizure. The folders contain features extracted from all possible snippets from the small (S) and large (L) views. The snippets were 8 frames long and downsampled to 15 frames per second. The features are in ".pth" format and can be loaded using PyTorch: https://pytorch.org/docs/stable/generated/torch.load.html The last number of the file name indicates the frame index. For example, the file "006_01_L_000015.pth" corresponds to the features extracted from a snippet starting one second into the seizure video. Each file contains 512 numbers representing the deep features extracted from the corresponding snippet.3) A description file, "README.txt".

  10. R

    Visdrone Video Dataset

    • universe.roboflow.com
    zip
    Updated Aug 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fisheye Lens (2023). Visdrone Video Dataset [Dataset]. https://universe.roboflow.com/fisheye-lens/visdrone-video/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 4, 2023
    Dataset authored and provided by
    Fisheye Lens
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Car People Pedestrians Van Motor Bounding Boxes
    Description

    VisDrone Video

    ## Overview
    
    VisDrone Video is a dataset for object detection tasks - it contains Car People Pedestrians Van Motor annotations for 6,275 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  11. T

    Text-to-video Model Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Text-to-video Model Report [Dataset]. https://www.archivemarketresearch.com/reports/text-to-video-model-23816
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Feb 13, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Market Overview: The global text-to-video model market is experiencing explosive growth, with a projected CAGR of 56.6% from 2023 to 2033. In 2025, the market was valued at $2,219 million and is anticipated to reach a staggering value of over $43,674 million by 2033. This growth is driven by advancements in artificial intelligence (AI) and machine learning (ML) technologies, which enable the generation of high-quality videos from textual descriptions. Key Trends and Drivers: The text-to-video model market is influenced by several key trends and drivers, including:

    Rising demand for interactive and immersive content: Text-to-video models allow for the creation of engaging and visually appealing content that can captivate audiences and enhance user experiences. Advancements in generative AI: Ongoing advancements in generative AI, such as deep learning and transformer models, are enabling text-to-video models to generate increasingly realistic and detailed videos. Growing adoption in entertainment and media: The entertainment and media industry has been a major driver of text-to-video model adoption, as these models can be used to create trailers, commercials, and even entire films.

    Segments and Regional Analysis: The text-to-video model market can be segmented by type (below 3 billion parameters and above 3 billion parameters), application (entertainment and media, film and television, cartoon, education, and others), and region (North America, South America, Europe, Middle East & Africa, and Asia Pacific). North America currently holds the largest market share due to the early adoption of AI technologies and the presence of key players like OpenAI and Meta. However, Asia Pacific is expected to witness significant growth over the forecast period, driven by the rapidly growing entertainment and media industries in China and India. This comprehensive report provides in-depth insights into the thriving text-to-video model market, valued at approximately $20 billion in 2023.

  12. f

    Dataset-I-drinking-related-object-detection (in both YoloV8 and COCO format)...

    • kcl.figshare.com
    Updated Feb 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Chen; Xinqi Bao; Ernest Kamavuako (2025). Dataset-I-drinking-related-object-detection (in both YoloV8 and COCO format) [Dataset]. http://doi.org/10.18742/26337085.v1
    Explore at:
    Dataset updated
    Feb 27, 2025
    Dataset provided by
    King's College London
    Authors
    Xin Chen; Xinqi Bao; Ernest Kamavuako
    License

    https://www.kcl.ac.uk/researchsupport/assets/DataAccessAgreement-Description.pdfhttps://www.kcl.ac.uk/researchsupport/assets/DataAccessAgreement-Description.pdf

    Description

    This dataset contains annotated images for object detection for containers and hands in a first-person view (egocentric view) during drinking activities. Both YOLOV8 format and COCO format are provided.Please refer to our paper for more details.Purpose: Training and testing the object detection model.Content: Videos from Session 1 of Subjects 1-20.Images: Extracted from the videos of Subjects 1-20 Session 1.Additional Images:~500 hand/container images from Roboflow Open Source data.~1500 null (background) images from VOC Dataset and MIT Indoor Scene Recognition Dataset:1000 indoor scenes from 'MIT Indoor Scene Recognition'400 other unrelated objects from VOC DatasetData Augmentation:Horizontal flipping±15% brightness change±10° rotationFormats Provided:COCO formatPyTorch YOLOV8 formatImage Size: 416x416 pixelsTotal Images: 16,834Training: 13,862Validation: 1,975Testing: 997Instance Numbers:Containers: Over 10,000Hands: Over 8,000

  13. Z

    Replay-Attack

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcel, Sébastien (2023). Replay-Attack [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4593128
    Explore at:
    Dataset updated
    Mar 6, 2023
    Dataset provided by
    Marcel, Sébastien
    Anjos, André
    Chingovska, Ivana
    Description

    Replay-Attack is a dataset for face recognition and presentation attack detection (anti-spoofing). The dataset consists of 1300 video clips of photo and video presentation attack (spoofing attacks) to 50 clients, under different lighting conditions.

    Spoofing Attacks Description

    The 2D face spoofing attack database consists of 1,300 video clips of photo and video attack attempts of 50 clients, under different lighting conditions.

    The data is split into 4 sub-groups comprising:

    Training data ("train"), to be used for training your anti-spoof classifier;

    Development data ("devel"), to be used for threshold estimation;

    Test data ("test"), with which to report error figures;

    Enrollment data ("enroll"), that can be used to verify spoofing sensitivity on face detection algorithms.

    Clients that appear in one of the data sets (train, devel or test) do not appear in any other set.

    Database Description

    All videos are generated by either having a (real) client trying to access a laptop through a built-in webcam or by displaying a photo or a video recording of the same client for at least 9 seconds. The webcam produces colour videos with a resolution of 320 pixels (width) by 240 pixels (height). The movies were recorded on a Macbook laptop using the QuickTime framework (codec: Motion JPEG) and saved into ".mov" files. The frame rate is about 25 Hz. Besides the native support on Apple computers, these files are easily readable using mplayer, ffmpeg or any other video utilities available under Linux or MS Windows systems.

    Real client accesses as well as data collected for the attacks are taken under two different lighting conditions:

    • controlled: The office light was turned on, blinds are down, background is homogeneous;
    • adverse: Blinds up, more complex background, office lights are out.

    To produce the attacks, high-resolution photos and videos from each client were taken under the same conditions as in their authentication sessions, using a Canon PowerShot SX150 IS camera, which records both 12.1 Mpixel photographs and 720p high-definition video clips. The way to perform the attacks can be divided into two subsets: the first subset is composed of videos generated using a stand to hold the client biometry ("fixed"). For the second set, the attacker holds the device used for the attack with their own hands. In total, 20 attack videos were registered for each client, 10 for each of the attacking modes just described:

    4 x mobile attacks using an iPhone 3GS screen (with resolution 480x320 pixels) displaying:

    1 x mobile photo/controlled

    1 x mobile photo/adverse

    1 x mobile video/controlled

    1 x mobile video/adverse

    4 x high-resolution screen attacks using an iPad (first generation, with a screen resolution of 1024x768 pixels) displaying:

    1 x high-resolution photo/controlled

    1 x high-resolution photo/adverse

    1 x high-resolution video/controlled

    1 x high-resolution video/adverse

    2 x hard-copy print attacks (produced on a Triumph-Adler DCC 2520 color laser printer) occupying the whole available printing surface on A4 paper for the following samples:

    1 x high-resolution print of photo/controlled

    1 x high-resolution print of photo/adverse

    The 1300 real-accesses and attacks videos were then divided in the following way:

    Training set: contains 60 real-accesses and 300 attacks under different lighting conditions;

    Development set: contains 60 real-accesses and 300 attacks under different lighting conditions;

    Test set: contains 80 real-accesses and 400 attacks under different lighting conditions;

    Face Locations

    We also provide face locations automatically annotated by a cascade of classifiers based on a variant of Local Binary Patterns (LBP) referred as Modified Census Transform (MCT) [Face Detection with the Modified Census Transform, Froba, B. and Ernst, A., 2004, IEEE International Conference on Automatic Face and Gesture Recognition, pp. 91-96]. The automatic face localisation procedure works in more than 99% of the total number of frames acquired. This means that less than 1% of the total set of frames for all videos do not possess annotated faces. User algorithms must account for this fact.

    Protocol for Licit Biometric Transactions

    It is possible to measure the performance of baseline face recognition systems on the 2D Face spoofing database and evaluate how well the attacks pass such systems or how, otherwise robust they are to attacks. Here we describe how to use the available data at the enrolment set to create a background model, client models and how to perform scoring using the available data.

    Universal Background Model (UBM): To generate the UBM, subselect the training-set client videos from the enrollment videos. There should be 2 per client, which means you get 30 videos, each with 375 frames to create the model;

    Client models: To generate client models, use the enrollment data for clients at the development and test groups. There should be 2 videos per client (one for each light condition) once more. At the end of the enrollment procedure, the development set must have 1 model for each of the 15 clients available in that set. Similarly, for the test set, 1 model for each of the 20 clients available;

    For a simple baseline verification, generate scores exhaustively for all videos from the development and test real-accesses respectively, but without intermixing accross development and test sets. The scores generated against matched client videos and models (within the subset, i.e. development or test) should be considered true client accesses, while all others impostors;

    If you are looking for a single number to report on the performance do the following: exclusively using the scores from the development set, tune your baseline face recognition system on the EER of the development set and use this threshold to find the HTER on the test set scores.

    Protocols for Spoofing Attacks

    Attack protocols are used to evaluate the (binary classification) performance of counter-measures to spoof attacks. The database can be split into 6 different protocols according to the type of device used to generate the attack: print, mobile (phone), high-definition (tablet), photo, video or grand test (all types). Furthermore, subsetting can be achieved on the top of the previous 6 groups by classifying attacks as performed by the attacker bare hands or using a fixed support. This classification scheme makes-up a total of 18 protocols that can be used for studying the performance of counter-measures to 2D face spoofing attacks. The table bellow details the amount of video clips in each protocol.

    Acknowledgements

    If you use this database, please cite the following publication:

    I. Chingovska, A. Anjos, S. Marcel,"On the Effectiveness of Local Binary Patterns in Face Anti-spoofing"; IEEE BIOSIG, 2012. https://ieeexplore.ieee.org/document/6313548 http://publications.idiap.ch/index.php/publications/show/2447

  14. h

    hispanic-people-liveness-detection-video-dataset

    • huggingface.co
    Updated Apr 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2024). hispanic-people-liveness-detection-video-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/hispanic-people-liveness-detection-video-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 24, 2024
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Biometric Attack Dataset, Hispanic People

      The similar dataset that includes all ethnicities - Anti Spoofing Real Dataset
    

    The dataset for face anti spoofing and face recognition includes images and videos of hispanic people. 32,600+ photos & video of 16,300 people from 20 countries. The dataset helps in enchancing the performance of the model by providing wider range of data for a specific ethnic group. The videos were gathered by capturing faces of genuine individuals… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/hispanic-people-liveness-detection-video-dataset.

  15. r

    Data from: CVB: A Video Dataset of Cattle Visual Behaviors

    • researchdata.edu.au
    • data.csiro.au
    datadownload
    Updated Jun 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Flavio Pereira Alvarenga; Aaron Ingham; Lars Petersson; Brano Kusy; Vivien Rolland; Brendan Do; Neil Bagnall; Jody McNally; Greg Bishop-Hurley; Reza Arablouei; Ali Zia; Renuka Sharma (2023). CVB: A Video Dataset of Cattle Visual Behaviors [Dataset]. http://doi.org/10.25919/3G3T-P068
    Explore at:
    datadownloadAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    Commonwealth Scientific and Industrial Research Organisation
    Authors
    Flavio Pereira Alvarenga; Aaron Ingham; Lars Petersson; Brano Kusy; Vivien Rolland; Brendan Do; Neil Bagnall; Jody McNally; Greg Bishop-Hurley; Reza Arablouei; Ali Zia; Renuka Sharma
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Time period covered
    Aug 1, 2022 - Apr 28, 2023
    Area covered
    Description

    Existing image/video datasets for cattle behavior recognition are mostly small, lack well-defined labels, or are collected in unrealistic controlled environments. This limits the utility of machine learning (ML) models learned from them. Therefore, we introduce a new dataset, called Cattle Visual Behaviors (CVB), that consists of 502 video clips, each fifteen seconds long, captured in natural lighting conditions, and annotated with eleven visually perceptible behaviors of grazing cattle. By creating and sharing CVB, our aim is to develop improved models capable of recognizing all important behaviors accurately and to assist other researchers and practitioners in developing and evaluating new ML models for cattle behavior classification using video data. The dataset is presented in the form of following three sub-directories. 1. raw_frames: contains 450 frames in each sub folder, representing 15 sec video, taking at a frames rate of 30 FPS, 2. annotations: contains the json files corresponding to the raw_frames folder. We have one json file for one video, containing the bounding box annotations for each cattle and their associated behaviors, and 3. CVB_in_AVA_format: contains the CVB data in the standard AVA dataset format which we have used to apply SlowFast model. Lineage: We use the Computer Vision Annotation Tool (CVAT) to collect our annotations. To make the procedure more efficient, we perform an initial detection and tracking of cattle in the videos using appropriate pre-trained models. The results are corrected by domain experts along with cattle behavior labeling in CVAT. The pre-hoc detection and tracking step significantly reduces the manual annotation time and effort.

  16. sleep dataset.zip

    • figshare.com
    • osapublishing.org
    zip
    Updated Jul 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Menghan Hu (2019). sleep dataset.zip [Dataset]. http://doi.org/10.6084/m9.figshare.5518996.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 15, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Menghan Hu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a dual-mode sleep video database. Please cite the following paper if you wish to use our dataset: Hu M, Zhai G, Li D, et al. Combination of near-infrared and thermal imaging techniques for the remote and simultaneous measurements of breathing and heart rates under sleep situation[J]. PloS one, 2018, 13(1): e0190466.If you have any questions, you can send a request to: humenghan89@163.com

  17. w

    Global Ai Powered Video Generator Market Research Report: By Deployment Mode...

    • wiseguyreports.com
    Updated Jul 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wWiseguy Research Consultants Pvt Ltd (2024). Global Ai Powered Video Generator Market Research Report: By Deployment Mode (Cloud-based, On-premises), By Video Type (Marketing Videos, Product Videos, Training Videos, Social Media Videos), By Vertical (Education, Healthcare, Retail, E-commerce, Media and Entertainment), By End-User (Small and Medium Enterprises (SMEs), Large Enterprises, Content Creators, Marketing Agencies) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/reports/ai-powered-video-generator-market
    Explore at:
    Dataset updated
    Jul 18, 2024
    Dataset authored and provided by
    wWiseguy Research Consultants Pvt Ltd
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Jan 7, 2024
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2024
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20230.04(USD Billion)
    MARKET SIZE 20240.05(USD Billion)
    MARKET SIZE 20320.5(USD Billion)
    SEGMENTS COVEREDDeployment Mode ,Video Type ,Vertical ,End-User ,Regional
    COUNTRIES COVEREDNorth America, Europe, APAC, South America, MEA
    KEY MARKET DYNAMICS1 Rising Demand for Personalized Content 2 Increase in Video Marketing Adoption 3 Growing Popularity of AIPowered Solutions 4 Emergence of CloudBased Platforms 5 Integration with Social Media Platforms
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDInvideo.io ,Magisto ,Canva ,Videoa ,Kapwing ,Lumen5 ,VEED.IO ,Kizoa ,Synthesia ,Flixier ,Biteable ,Runway ML ,Animaker ,Adobe Spark ,Hippo Video
    MARKET FORECAST PERIOD2024 - 2032
    KEY MARKET OPPORTUNITIESContent creation automation Realtime video generation Personalized marketing videos Video production cost reduction Accessibility for nonvideo experts
    COMPOUND ANNUAL GROWTH RATE (CAGR) 33.23% (2024 - 2032)
  18. VRiV (Vehicle Recognition in Videos) Dataset

    • kaggle.com
    zip
    Updated Dec 5, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Landry KEZEBOU (2021). VRiV (Vehicle Recognition in Videos) Dataset [Dataset]. https://www.kaggle.com/landrykezebou/vriv-vehicle-recognition-in-videos-dataset
    Explore at:
    zip(2383870377 bytes)Available download formats
    Dataset updated
    Dec 5, 2021
    Authors
    Landry KEZEBOU
    Description

    Context

    The concept of searching and localizing vehicles from live traffic videos based on descriptive textual input has yet to be explored in the scholarly literature. Endowing Intelligent Transportation Systems (ITS) with such a capability could help solve crimes on roadways. While artificial intelligence (AI) can be a powerful tool for this data intensive application, existing state-of-the-art AI models struggle with fine-grain vehicle recognition. Typically, only reporting model performance on still input image data, often captured at high resolution and at pristine quality. These settings are not reflective of real-world operating conditions and thus, recognition accuracies typically cannot be replicated on video data. One major impediment to the advancement of fine-grain vehicle recognition models is the lack of video testbench datasets with annotated ground-truth data. Additionally, to the best of our knowledge, no metrics currently exist for evaluating the robustness and performance efficiency of a vehicle recognition model on live videos, and even less so for vehicle search and localization models. In this paper, we address these challenges by proposing V-Localize, a novel artificial intelligence framework for vehicle search and continuous localization captured from live traffic videos based on input textual descriptions. An efficient hashgraph algorithm is introduced to process input text (such as a sentence, paragraph, or report) to extract detailed target information used to query the recognition and localization model. This work further introduces two novel datasets that will help advance AI research in these challenging areas. These datasets include: a) the most diverse and large-scale Vehicle Color Recognition (VCoR) dataset with 15 colors classes -- twice as many as the number of color classes in the largest existing such dataset -- to facilitate finer-grain recognition with color information; and b) a Vehicle Recognition in Video (VRiV) dataset, which is a first of its kind video test-bench dataset for evaluating the performance of vehicle recognition models in live videos rather than still image data. The VRiV dataset will open new avenues for AI researchers to investigate innovative approaches that were previously intractable due to the lack of a traffic vehicle recognition annotated test-bench video dataset. Finally, to address the gap in the field, 5 novel metrics are introduced in this paper for adequately accessing the performance of vehicle recognition models in live videos. Ultimately, the proposed metrics could also prove intuitively effective at quantitative model evaluation in other video recognition applications. The novel metrics and VRiV test-bench dataset introduced in this paper are specifically aimed at advancing state-of-the-art research for vehicle recognition in videos. Likewise, the proposed novel vehicle search and continuous localization framework could prove assistive in cases such as of amber alerts or hit-and-run incidents. One major advantage of the proposed system is that it can be integrated into intelligent transportation system software to help aid law-enforcement.

    Image Acquisition

    The proposed Vehicle Recognition in Video (VRiV) dataset is the first of its kind and is aimed at developing, improving, and analyzing performance of vehicle search and recognition models on live videos. The lack of such a dataset has limited performance analysis of modern fine-grain vehicle recognition systems to only still image input data, making them less suitable for video applications. The VRiV dataset is introduced to help bridge this gap and foster research in this direction. The proposed VRiV dataset consists of up to 47 video sequences averaging about 38.5 seconds per video. The videos are recorded in a traffic setting focusing on vehicles of volunteer candidates whose ground truth make, model, year and color information are known. For security reasons and safety of participants, experiments are conducted on streets/road with low traffic density. For each video, there is a target vehicle with known ground truth information, and there are other vehicles either moving in traffic or parked on side streets, to simulate real-world traffic scenario. The goal is for the algorithm to be able to search, recognize and continuously localize just the specific target vehicle of interest for the corresponding video based on the search query. It is worth noting that the ground truth information about other vehicles in the videos are not known. The 47 videos in the testbench dataset are distributed across 7 distinct makes and 17 model designs as shown in Figure 10. The videos are also annotated to include ground truth bounding boxes for the specific target vehicles in corresponding videos. The dataset includes more than 46k annotated frames averaging about 920 frames per video. This dataset will be made available on Kaggle, and new videos will be added as they become available.

    Content

    There is one main zip file available for download. The zip file contains 94 files. 1) 47 video files 2) 47 ground-truth annotated files which identifies locations where the vehicle of interest is in the frame. Each video file is labelled with the corresponding vehicle brand name, model, year, and color information.

    Terms and Conditions

    • Videos provided in this dataset are freely available for research and education purposes only. Please be sure to properly credit the authors by citing the article below.
    • Be sure to upvote this dataset if you find it useful by scrolling up and clicking the ^ sign at the top-right corner of the cover image of this page.
    • Be sure to blur out all plate numbers before publishing any of the contents available in this dataset.

    Acknowledgements

    Any publication using this database must reference to the following journal manuscript:

    Note: if the link is broken, please use http instead of https.

    In Chrome, use the steps recommended in the following website to view the webpage if it appears to be broken https://www.technipages.com/chrome-enabledisable-not-secure-warning

    VCoR dataset: https://www.kaggle.com/landrykezebou/vcor-vehicle-color-recognition-dataset VRiV dataset: https://www.kaggle.com/landrykezebou/vriv-vehicle-recognition-in-videos-dataset

    For any enquires regarding the VCoR dataset, contact: landrykezebou@gmail.com

  19. f

    Eight machine learning classifiers used for video analysis and autism...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qandeel Tariq; Jena Daniels; Jessey Nicole Schwartz; Peter Washington; Haik Kalantarian; Dennis Paul Wall (2023). Eight machine learning classifiers used for video analysis and autism detection. [Dataset]. http://doi.org/10.1371/journal.pmed.1002705.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS Medicine
    Authors
    Qandeel Tariq; Jena Daniels; Jessey Nicole Schwartz; Peter Washington; Haik Kalantarian; Dennis Paul Wall
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The models were constructed from an analysis of archived medical records from the use of standard instruments, including the ADOS and the ADI-R. All 8 models identified a small, stable subset of features in cross-validation experiments. The total numbers of affected and unaffected control participants for training and testing are provided together with measures of accuracy on the test set. Four models were tested on independent datasets and have been mentioned in a separate “Test” category. The remaining 4, indicated with “Train/test,” used the given dataset with an 80%:20% train:test split to calculate test accuracy on the 20% held-out test set. The naming convention of the classifiers is “model type”-“number of features”.

  20. f

    Data from: A game-theoretic analysis for creative mid-roll ads on a content...

    • tandf.figshare.com
    text/x-tex
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaofu Du; Zaichen Luo; Li Hu (2025). A game-theoretic analysis for creative mid-roll ads on a content platform [Dataset]. http://doi.org/10.6084/m9.figshare.28016248.v1
    Explore at:
    text/x-texAvailable download formats
    Dataset updated
    Jun 4, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Shaofu Du; Zaichen Luo; Li Hu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The accessibility of pre-roll ads (PAs) has significantly diminished due to the privilege of skipping ads for paid members. Leading video platforms have introduced a new type of streaming ad, the creative mid-roll ad (CMA), which reduces consumer aversion to ads by tightly integrating them with video. Moreover, paid members are also a target group for CMAs, which may conflict with their interests and potentially reduce their viewing volume. We develop a game-theoretical model to examine the platform's advertising pricing and the advertiser's mode selection process. Counterintuitively, the CMA fee rate at equilibrium decreases as video attractiveness and the CMA's ad conversion ability increase because higher video attractiveness and ad conversion ability enhance the advantages of lowering the CMA fee rate. We find that the advertiser will balance conversion efficiency and price gaps between the two ad modes when making decisions. As increased conversion ability prompts advertisers to invest more, a significant gap in conversion efficiency results in higher equilibrium investment in PA mode compared to CMA mode. A low proportion of paid members results in a better consumer surplus in CMA mode than in PA mode, which is because the CMA mode targets paid members with its advertisements.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mingfei Han; Linjie Yang; Xiaojie Jin; Jiashi Feng; Xiaojun Chang; Heng Wang (2024). PortraitMode-400 Dataset [Dataset]. https://paperswithcode.com/dataset/portraitmode-400

PortraitMode-400 Dataset

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Apr 9, 2024
Authors
Mingfei Han; Linjie Yang; Xiaojie Jin; Jiashi Feng; Xiaojun Chang; Heng Wang
Description

The PortraitMode-400 dataset is a significant contribution to the field of video recognition, specifically focusing on portrait mode videos. Let me provide you with more details:

Dataset Overview: The PortraitMode-400 (PM-400) dataset is the first of its kind and is dedicated to portrait mode video recognition. It was created to address the unique challenges associated with recognizing videos captured in portrait mode.

Portrait mode videos are increasingly important due to the growing popularity of smartphones and social media applications.

Data Collection and Annotation:

The dataset consists of 76,000 videos collected from Douyin, a popular short-video application. These videos were meticulously annotated with 400 fine-grained categories.

Rigorous quality assurance measures were implemented to ensure the accuracy of human annotations.

Research Insights and Impact:

The creators of the dataset conducted a comprehensive analysis to understand the impact of video format (portrait mode vs. landscape mode) on recognition accuracy. They also explored spatial bias arising from different video formats. Key aspects of portrait mode video recognition were investigated, including data augmentation, evaluation procedures, the importance of temporal information, and the role of audio modality.

(1) [2312.13746] Video Recognition in Portrait Mode - arXiv.org. https://arxiv.org/abs/2312.13746. (2) Video Recognition in Portrait Mode | Papers With Code. https://paperswithcode.com/paper/video-recognition-in-portrait-mode. (3) Video Recognition in Portrait Mode - arXiv.org. https://arxiv.org/pdf/2312.13746.pdf. (4) undefined. https://doi.org/10.48550/arXiv.2312.13746.

Search
Clear search
Close search
Google apps
Main menu