10 datasets found

h
PLM-Video-Human
huggingface.co
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI at Meta (2025). PLM-Video-Human [Dataset]. https://huggingface.co/datasets/facebook/PLM-Video-Human
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
AI at Meta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for PLM-Video Human

PLM-Video-Human is a collection of human-annotated resources for training Vision Language Models, focused on detailed video understanding. Training tasks include: fine-grained open-ended question answering (FGQA), Region-based Video Captioning (RCap), Region-based Dense Video Captioning (RDCap) and Region-based Temporal Localization (RTLoc). [📃 Tech Report] [📂 Github]

Dataset Structure Fine-Grained Question Answering… See the full description on the dataset page: https://huggingface.co/datasets/facebook/PLM-Video-Human.
h
PII-PQA-raw
huggingface.co
Updated Dec 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yijia Xiao (2023). PII-PQA-raw [Dataset]. https://huggingface.co/datasets/Yijia-Xiao/PII-PQA-raw
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 12, 2023
Authors
Yijia Xiao
Description
Dataset Card for "PPLM-PQA"

More Information needed
h
PLM-Image-Auto
huggingface.co
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI at Meta (2025). PLM-Image-Auto [Dataset]. https://huggingface.co/datasets/facebook/PLM-Image-Auto
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
AI at Meta
License
https://choosealicense.com/licenses/llama3.2/https://choosealicense.com/licenses/llama3.2/
Description
Dataset Card for PLM-Image Auto

[📃 Tech Report] [📂 Github] Sythetic image captions and QAs used in PLM, please refer to the paper, Section 3, for more details. The sythetic annotations covers: SA1B, Openimages, Obejct365, ArxivQA, UCSF, PDFAcc.

Dataset Structure Image Captions (SA1B, Openimages, Obejct365)

Data fields are :

image_id: a string feature, unique identifier for the image. image: a string feature, the actual image path in the correspoding data… See the full description on the dataset page: https://huggingface.co/datasets/facebook/PLM-Image-Auto.
h
PLM-Video-Auto
huggingface.co
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI at Meta (2025). PLM-Video-Auto [Dataset]. https://huggingface.co/datasets/facebook/PLM-Video-Auto
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
AI at Meta
License
https://choosealicense.com/licenses/llama3.2/https://choosealicense.com/licenses/llama3.2/
Description
Dataset Card for PLM-Video Auto

[📃 Tech Report] [📂 Github] Sythetic video captions and MCQs used in PLM, please refer to the paper, Section 3, for more details. The sythetic annotations covers: YT-1B, Ego4d with captions, YT-1B with MCQAs and Ego4d with QAs.

Dataset Structure YT-1B Captions (yt1b_cap)

Data fields are :

video_id: a string feature, unique identifier for the YouTube videoid. scene_id: a string feature, unique identifier for the scene_id.… See the full description on the dataset page: https://huggingface.co/datasets/facebook/PLM-Video-Auto.
h
PLM-VideoBench
huggingface.co
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI at Meta (2025). PLM-VideoBench [Dataset]. https://huggingface.co/datasets/facebook/PLM-VideoBench
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
AI at Meta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Summary

PLM-VideoBench is a collection of human-annotated resources for evaluating Vision Language models, focused on detailed video understanding. [📃 Tech Report] [📂 Github]

Supported Tasks

PLM-VideoBench includes evaluation data for the following tasks:

FGQA

In this task, a model must answer a multiple-choice question (MCQ) that probes fine-grained activity understanding. Given a question and multiple options that differ in a… See the full description on the dataset page: https://huggingface.co/datasets/facebook/PLM-VideoBench.
h
plm-qa
huggingface.co
Updated Sep 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andre Zhang (2024). plm-qa [Dataset]. https://huggingface.co/datasets/xiaokangz/plm-qa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 14, 2024
Authors
Andre Zhang
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
xiaokangz/plm-qa dataset hosted on Hugging Face and contributed by the HF Datasets community
h
plm-env
huggingface.co
Updated Apr 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yan Song (2025). plm-env [Dataset]. https://huggingface.co/datasets/YanSong97/plm-env
Explore at:
Dataset updated
Apr 26, 2025
Authors
Yan Song
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
YanSong97/plm-env dataset hosted on Hugging Face and contributed by the HF Datasets community
h
PLM-Finetune
huggingface.co
Updated Jan 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yanjun CHEN (2025). PLM-Finetune [Dataset]. https://huggingface.co/datasets/Battam/PLM-Finetune
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 22, 2025
Authors
Yanjun CHEN
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Battam/PLM-Finetune dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Pretrain-Dataset
huggingface.co
Updated Aug 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PLM-Team (2025). Pretrain-Dataset [Dataset]. https://huggingface.co/datasets/PLM-Team/Pretrain-Dataset
Explore at:
Dataset updated
Aug 11, 2025
Dataset authored and provided by
PLM-Team
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This is the Pretraining Dataset for PLM. Due to the upload limit, we split the original dataset into parts that smaller than 50GB. We provide the merge and split scripts under scripts folder.
h
Spatial-Visualization-Benchmark
huggingface.co
Updated Jun 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PLM-Team (2025). Spatial-Visualization-Benchmark [Dataset]. https://huggingface.co/datasets/PLM-Team/Spatial-Visualization-Benchmark
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
PLM-Team
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Spatial Visualization Benchmark

This repository contains the Spatial Visualization Benchmark. The evaluation code is released on: wangst0181/Spatial-Visualization-Benchmark.

Dataset Description

The SpatialViz-Bench aims to evaluate the spatial visualization capabilities of multimodal large language models, which is a key component of spatial abilities. Targeting 4 sub-abilities of Spatial Visualization, including mental rotation, mental folding, visual penetration, and… See the full description on the dataset page: https://huggingface.co/datasets/PLM-Team/Spatial-Visualization-Benchmark.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

AI at Meta (2025). PLM-Video-Human [Dataset]. https://huggingface.co/datasets/facebook/PLM-Video-Human

PLM-Video-Human

plm_video_human

facebook/PLM-Video-Human

Explore at:

Dataset updated

Apr 17, 2025

Dataset authored and provided by

AI at Meta

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset Card for PLM-Video Human

PLM-Video-Human is a collection of human-annotated resources for training Vision Language Models, focused on detailed video understanding. Training tasks include: fine-grained open-ended question answering (FGQA), Region-based Video Captioning (RCap), Region-based Dense Video Captioning (RDCap) and Region-based Temporal Localization (RTLoc). [📃 Tech Report] [📂 Github]

  Dataset Structure






  Fine-Grained Question Answering… See the full description on the dataset page: https://huggingface.co/datasets/facebook/PLM-Video-Human.

Clear search

Close search

Google apps

Main menu

PLM-Video-Human

PII-PQA-raw

PLM-Image-Auto

PLM-Video-Auto

PLM-VideoBench

plm-qa

plm-env

PLM-Finetune

Pretrain-Dataset

Spatial-Visualization-Benchmark

PLM-Video-Human

plm_video_human

facebook/PLM-Video-Human