2 datasets found

h
BLIP3o-Pretrain-Long-Caption
huggingface.co
Updated May 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BLIP3o (2025). BLIP3o-Pretrain-Long-Caption [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption
Explore at:
Dataset updated
May 17, 2025
Dataset authored and provided by
BLIP3o
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
BLIP3o Pretrain Long-Caption Dataset

This collection contains 27 million images, each paired with a long (~120 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.

Download

from huggingface_hub import snapshot_download

snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Long-Caption", repo_type="dataset" )

Load Dataset without Extracting

You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption.
h
BLIP3o-Pretrain-Short-Caption
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BLIP3o, BLIP3o-Pretrain-Short-Caption [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Short-Caption
Explore at:
Dataset authored and provided by
BLIP3o
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
BLIP3o Pretrain Short-Caption Dataset

This collection contains 5 million images, each paired with a short (~20 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.

Download

from huggingface_hub import snapshot_download

snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Short-Caption", repo_type="dataset" )

Load Dataset without Extracting

You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Short-Caption.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

BLIP3o (2025). BLIP3o-Pretrain-Long-Caption [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption

BLIP3o-Pretrain-Long-Caption

BLIP3o/BLIP3o-Pretrain-Long-Caption

Explore at:

Dataset updated

May 17, 2025

Dataset authored and provided by

BLIP3o

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

BLIP3o Pretrain Long-Caption Dataset

This collection contains 27 million images, each paired with a long (~120 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.

  Download

from huggingface_hub import snapshot_download

snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Long-Caption", repo_type="dataset" )

  Load Dataset without Extracting

You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption.

Clear search

Close search

Google apps

Main menu

BLIP3o-Pretrain-Long-Caption

BLIP3o-Pretrain-Short-Caption

BLIP3o-Pretrain-Long-Caption

BLIP3o/BLIP3o-Pretrain-Long-Caption