Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for PLM-Video Human
PLM-Video-Human is a collection of human-annotated resources for training Vision Language Models, focused on detailed video understanding. Training tasks include: fine-grained open-ended question answering (FGQA), Region-based Video Captioning (RCap), Region-based Dense Video Captioning (RDCap) and Region-based Temporal Localization (RTLoc). [๐ Tech Report] [๐ Github]
Dataset Structure
Fine-Grained Question Answeringโฆ See the full description on the dataset page: https://huggingface.co/datasets/facebook/PLM-Video-Human.
Dataset Card for "PPLM-PQA"
More Information needed
https://choosealicense.com/licenses/llama3.2/https://choosealicense.com/licenses/llama3.2/
Dataset Card for PLM-Image Auto
[๐ Tech Report] [๐ Github] Sythetic image captions and QAs used in PLM, please refer to the paper, Section 3, for more details. The sythetic annotations covers: SA1B, Openimages, Obejct365, ArxivQA, UCSF, PDFAcc.
Dataset Structure
Image Captions (SA1B, Openimages, Obejct365)
Data fields are :
image_id: a string feature, unique identifier for the image. image: a string feature, the actual image path in the correspoding dataโฆ See the full description on the dataset page: https://huggingface.co/datasets/facebook/PLM-Image-Auto.
https://choosealicense.com/licenses/llama3.2/https://choosealicense.com/licenses/llama3.2/
Dataset Card for PLM-Video Auto
[๐ Tech Report] [๐ Github] Sythetic video captions and MCQs used in PLM, please refer to the paper, Section 3, for more details. The sythetic annotations covers: YT-1B, Ego4d with captions, YT-1B with MCQAs and Ego4d with QAs.
Dataset Structure
YT-1B Captions (yt1b_cap)
Data fields are :
video_id: a string feature, unique identifier for the YouTube videoid. scene_id: a string feature, unique identifier for the scene_id.โฆ See the full description on the dataset page: https://huggingface.co/datasets/facebook/PLM-Video-Auto.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Summary
PLM-VideoBench is a collection of human-annotated resources for evaluating Vision Language models, focused on detailed video understanding. [๐ Tech Report] [๐ Github]
Supported Tasks
PLM-VideoBench includes evaluation data for the following tasks:
FGQA
In this task, a model must answer a multiple-choice question (MCQ) that probes fine-grained activity understanding. Given a question and multiple options that differ in aโฆ See the full description on the dataset page: https://huggingface.co/datasets/facebook/PLM-VideoBench.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
xiaokangz/plm-qa dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
YanSong97/plm-env dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Battam/PLM-Finetune dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This is the Pretraining Dataset for PLM. Due to the upload limit, we split the original dataset into parts that smaller than 50GB. We provide the merge and split scripts under scripts folder.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Spatial Visualization Benchmark
This repository contains the Spatial Visualization Benchmark. The evaluation code is released on: wangst0181/Spatial-Visualization-Benchmark.
Dataset Description
The SpatialViz-Bench aims to evaluate the spatial visualization capabilities of multimodal large language models, which is a key component of spatial abilities. Targeting 4 sub-abilities of Spatial Visualization, including mental rotation, mental folding, visual penetration, andโฆ See the full description on the dataset page: https://huggingface.co/datasets/PLM-Team/Spatial-Visualization-Benchmark.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for PLM-Video Human
PLM-Video-Human is a collection of human-annotated resources for training Vision Language Models, focused on detailed video understanding. Training tasks include: fine-grained open-ended question answering (FGQA), Region-based Video Captioning (RCap), Region-based Dense Video Captioning (RDCap) and Region-based Temporal Localization (RTLoc). [๐ Tech Report] [๐ Github]
Dataset Structure
Fine-Grained Question Answeringโฆ See the full description on the dataset page: https://huggingface.co/datasets/facebook/PLM-Video-Human.