Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
BLIP3o Pretrain Long-Caption Dataset
This collection contains 27 million images, each paired with a long (~120 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.
Download
from huggingface_hub import snapshot_download
snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Long-Caption", repo_type="dataset" )
Load Dataset without Extracting
You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
BLIP3o Pretrain Short-Caption Dataset
This collection contains 5 million images, each paired with a short (~20 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.
Download
from huggingface_hub import snapshot_download
snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Short-Caption", repo_type="dataset" )
Load Dataset without Extracting
You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Short-Caption.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
BLIP3o Pretrain Long-Caption Dataset
This collection contains 27 million images, each paired with a long (~120 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.
Download
from huggingface_hub import snapshot_download
snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Long-Caption", repo_type="dataset" )
Load Dataset without Extracting
You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption.