Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
BLIP3o Pretrain Long-Caption Dataset
This collection contains 27 million images, each paired with a long (~120 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.
Download
from huggingface_hub import snapshot_download
snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Long-Caption", repo_type="dataset" )
Load Dataset without Extracting
You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
BLIP3o Pretrain JourneyDB Dataset
This collection contains 4 million JourneyDB images.
Download
from huggingface_hub import snapshot_download
snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-JourneyDB", repo_type="dataset" )
Load Dataset without Extracting
You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import load_dataset import glob
data_files = glob.glob("/your/data/path/*.tar")… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-JourneyDB.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is BLIP3o-60k Text-to-Image instruction tuning dataset distilled from GPT-4o, including the following categories:
JourneyDB Human (including MSCOCO with human caption, human gestures, occupations) Dalle3 Geneval (no overlap with test set) Common objects Simple text
Here we provide the code guidance to download tar file: from huggingface_hub import snapshot_download snapshot_download(repo_id='BLIP3o/BLIP3o-60k', repo_type=‘dataset’)
And you can use huggingface datasets to read the tar… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-60k.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
blip3o-caption-mini-arrow
blip3o-caption-mini-arrow is a high-quality, curated image-caption dataset derived and optimized from the original BLIP3o/BLIP3o-Pretrain-Long-Caption. This dataset is specifically filtered and processed for tasks involving long-form image captioning and vision-language understanding.
Overview
Total Samples: 91,600 Modality: Image ↔ Text Format: Arrow (auto-converted to Parquet) License: Apache 2.0 Language: English Size: ~4.5 GB… See the full description on the dataset page: https://huggingface.co/datasets/prithivMLmods/blip3o-caption-mini-arrow.
redshallot/Bagel-new-BLIP3o-5k dataset hosted on Hugging Face and contributed by the HF Datasets community
sebgao/blip3o-pretrain-short-recaptioned dataset hosted on Hugging Face and contributed by the HF Datasets community
wusize/BLIP3o-60k dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Caption3o-Opt
Caption3o-Opt is a compact, high-quality image-caption dataset derived from the original BLIP3o/BLIP3o-Pretrain-Long-Caption. This refined subset focuses on optimized long-form captioning, curated for real-world and artistic image understanding across vision-language models.
Overview
Total Samples: 10,278 Modality: Image ↔ Text Format: Arrow (auto-converted to Parquet) License: Apache 2.0 Language: English Size: ~500 MB
Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/prithivMLmods/Caption3o-Opt.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Caption3o-Opt-v2
Caption3o-Opt-v2 is a high-quality, compact image-caption dataset designed for training and evaluating image-to-text models. Derived from the larger BLIP3o/BLIP3o-Pretrain-Long-Caption, this optimized subset emphasizes long-form captions and covers a wide range of real-world and artistic scenes.
Dataset Summary
Size: 10,277 image-caption pairs Format: Parquet Image resolution: 512x512 Languages: English Modality: Image-to-Text License: Apache-2.0… See the full description on the dataset page: https://huggingface.co/datasets/prithivMLmods/Caption3o-Opt-v2.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Corvus-OCR-Caption-Mini-Mix
Corvus-OCR-Caption-Mini-Mix is a high-quality, compact image-caption dataset designed for training and evaluating image-to-text models. It is a carefully curated subset of the larger BLIP3o/BLIP3o-Pretrain-Long-Caption, optimized for mixed OCR and long-form captioning tasks.
Dataset Summary
This dataset contains a balanced mix of:
Long-form natural language captions OCR-heavy samples with scientific, mathematical, and document-style… See the full description on the dataset page: https://huggingface.co/datasets/prithivMLmods/Corvus-OCR-Caption-Mini-Mix.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Geneval-style dataset is sourced from BLIP3o-60k.
This dataset is presented in the paper: UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation More details can be found in UniWorld-V1
Data preparation
Download the data from LanguageBind/UniWorld-V1. The dataset consists of two parts: source images and annotation JSON files. Prepare a data.txt file in the following format:
The first column is the root path to the image.
The second… See the full description on the dataset page: https://huggingface.co/datasets/LanguageBind/UniWorld-V1.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Türkçe Image Captioning Veri Seti
Bu veri seti BLIP3o modelinin pre-train eğitiminde kullanılan BLIP3o-Pretrain-Long-Caption ve BLIP3o-Pretrain-Short-Caption veri setlerinin Türkçeye çevirilmiş bir alt parçasıdır. Veri setinin oluşturulması ile ilgili detaylı bilgiye orijinal veri seti üzerinden ulaşabilirsiniz. Veri seti Image-to-Text modellerinin eğitilmesinde veya ince ayar sürecinde kullanılabilir. Veri seti, orijinal veri setinin lisansı olan Apache 2.0 altında paylaşılmıştır.… See the full description on the dataset page: https://huggingface.co/datasets/ituperceptron/image-captioning-turkish.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Corvus-OCR-Caption-Mix
Corvus-OCR-Caption-Mix is a high-quality, compact image-caption dataset designed for training and evaluating image-to-text models. This collection is derived and optimized from the larger BLIP3o/BLIP3o-Pretrain-Long-Caption, with a focus on long-form captions and mixed OCR tasks across a variety of image types.
Dataset Summary
The dataset spans over 229,000 image-caption pairs and provides a balanced blend of:
OCR-rich documents featuring… See the full description on the dataset page: https://huggingface.co/datasets/prithivMLmods/Corvus-OCR-Caption-Mix.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
BLIP3o Pretrain Long-Caption Dataset
This collection contains 27 million images, each paired with a long (~120 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.
Download
from huggingface_hub import snapshot_download
snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Long-Caption", repo_type="dataset" )
Load Dataset without Extracting
You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption.