13 datasets found

h
BLIP3o-Pretrain-Long-Caption
huggingface.co
Updated May 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BLIP3o (2025). BLIP3o-Pretrain-Long-Caption [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption
Explore at:
Dataset updated
May 17, 2025
Dataset authored and provided by
BLIP3o
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
BLIP3o Pretrain Long-Caption Dataset

This collection contains 27 million images, each paired with a long (~120 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.

Download

from huggingface_hub import snapshot_download

snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Long-Caption", repo_type="dataset" )

Load Dataset without Extracting

You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption.
h
BLIP3o-Pretrain-JourneyDB
huggingface.co
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BLIP3o (2025). BLIP3o-Pretrain-JourneyDB [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-JourneyDB
Explore at:
Dataset updated
May 27, 2025
Dataset authored and provided by
BLIP3o
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
BLIP3o Pretrain JourneyDB Dataset

This collection contains 4 million JourneyDB images.

Download

from huggingface_hub import snapshot_download

snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-JourneyDB", repo_type="dataset" )

Load Dataset without Extracting

You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import load_dataset import glob

data_files = glob.glob("/your/data/path/*.tar")… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-JourneyDB.
h
BLIP3o-60k
huggingface.co
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BLIP3o (2025). BLIP3o-60k [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-60k
Explore at:
Dataset updated
May 13, 2025
Dataset authored and provided by
BLIP3o
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This is BLIP3o-60k Text-to-Image instruction tuning dataset distilled from GPT-4o, including the following categories:

JourneyDB Human (including MSCOCO with human caption, human gestures, occupations) Dalle3 Geneval (no overlap with test set) Common objects Simple text

Here we provide the code guidance to download tar file: from huggingface_hub import snapshot_download snapshot_download(repo_id='BLIP3o/BLIP3o-60k', repo_type=‘dataset’)

And you can use huggingface datasets to read the tar… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-60k.
h
blip3o-caption-mini-arrow
huggingface.co
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prithiv Sakthi (2025). blip3o-caption-mini-arrow [Dataset]. https://huggingface.co/datasets/prithivMLmods/blip3o-caption-mini-arrow
Explore at:
Dataset updated
Jun 27, 2025
Authors
Prithiv Sakthi
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
blip3o-caption-mini-arrow

blip3o-caption-mini-arrow is a high-quality, curated image-caption dataset derived and optimized from the original BLIP3o/BLIP3o-Pretrain-Long-Caption. This dataset is specifically filtered and processed for tasks involving long-form image captioning and vision-language understanding.

Overview

Total Samples: 91,600 Modality: Image ↔ Text Format: Arrow (auto-converted to Parquet) License: Apache 2.0 Language: English Size: ~4.5 GB… See the full description on the dataset page: https://huggingface.co/datasets/prithivMLmods/blip3o-caption-mini-arrow.
h
Bagel-new-BLIP3o-5k
huggingface.co
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jun He (2025). Bagel-new-BLIP3o-5k [Dataset]. https://huggingface.co/datasets/redshallot/Bagel-new-BLIP3o-5k
Explore at:
Dataset updated
Jul 1, 2025
Authors
Jun He
Description
redshallot/Bagel-new-BLIP3o-5k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
blip3o-pretrain-short-recaptioned
huggingface.co
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ziteng Gao (2025). blip3o-pretrain-short-recaptioned [Dataset]. https://huggingface.co/datasets/sebgao/blip3o-pretrain-short-recaptioned
Explore at:
Dataset updated
Jun 21, 2025
Authors
Ziteng Gao
Description
sebgao/blip3o-pretrain-short-recaptioned dataset hosted on Hugging Face and contributed by the HF Datasets community
h
BLIP3o-60k
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Size Wu (2025). BLIP3o-60k [Dataset]. https://huggingface.co/datasets/wusize/BLIP3o-60k
Explore at:
Dataset updated
Jun 1, 2025
Authors
Size Wu
Description
wusize/BLIP3o-60k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Caption3o-Opt
huggingface.co
Updated Jul 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prithiv Sakthi (2025). Caption3o-Opt [Dataset]. https://huggingface.co/datasets/prithivMLmods/Caption3o-Opt
Explore at:
Dataset updated
Jul 2, 2025
Authors
Prithiv Sakthi
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Caption3o-Opt

Caption3o-Opt is a compact, high-quality image-caption dataset derived from the original BLIP3o/BLIP3o-Pretrain-Long-Caption. This refined subset focuses on optimized long-form captioning, curated for real-world and artistic image understanding across vision-language models.

Overview

Total Samples: 10,278 Modality: Image ↔ Text Format: Arrow (auto-converted to Parquet) License: Apache 2.0 Language: English Size: ~500 MB

Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/prithivMLmods/Caption3o-Opt.
h
Caption3o-Opt-v2
huggingface.co
Updated Jul 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prithiv Sakthi (2025). Caption3o-Opt-v2 [Dataset]. https://huggingface.co/datasets/prithivMLmods/Caption3o-Opt-v2
Explore at:
Dataset updated
Jul 13, 2025
Authors
Prithiv Sakthi
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Caption3o-Opt-v2

Caption3o-Opt-v2 is a high-quality, compact image-caption dataset designed for training and evaluating image-to-text models. Derived from the larger BLIP3o/BLIP3o-Pretrain-Long-Caption, this optimized subset emphasizes long-form captions and covers a wide range of real-world and artistic scenes.

Dataset Summary

Size: 10,277 image-caption pairs Format: Parquet Image resolution: 512x512 Languages: English Modality: Image-to-Text License: Apache-2.0… See the full description on the dataset page: https://huggingface.co/datasets/prithivMLmods/Caption3o-Opt-v2.
h
Corvus-OCR-Caption-Mini-Mix
huggingface.co
Updated Jul 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
Dataset updated
Jul 12, 2025
Authors
Prithiv Sakthi
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Corvus-OCR-Caption-Mini-Mix

Corvus-OCR-Caption-Mini-Mix is a high-quality, compact image-caption dataset designed for training and evaluating image-to-text models. It is a carefully curated subset of the larger BLIP3o/BLIP3o-Pretrain-Long-Caption, optimized for mixed OCR and long-form captioning tasks.

Dataset Summary

This dataset contains a balanced mix of:

Long-form natural language captions OCR-heavy samples with scientific, mathematical, and document-style… See the full description on the dataset page: https://huggingface.co/datasets/prithivMLmods/Corvus-OCR-Caption-Mini-Mix.
h
UniWorld-V1
huggingface.co
Updated Jun 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
linbin (2025). UniWorld-V1 [Dataset]. https://huggingface.co/datasets/LanguageBind/UniWorld-V1
Explore at:
Dataset updated
Jun 13, 2025
Authors
linbin
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The Geneval-style dataset is sourced from BLIP3o-60k.

This dataset is presented in the paper: UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation More details can be found in UniWorld-V1

Data preparation

Download the data from LanguageBind/UniWorld-V1. The dataset consists of two parts: source images and annotation JSON files. Prepare a data.txt file in the following format:

The first column is the root path to the image.

The second… See the full description on the dataset page: https://huggingface.co/datasets/LanguageBind/UniWorld-V1.
h
image-captioning-turkish
huggingface.co
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ITU Perceptron (2025). image-captioning-turkish [Dataset]. https://huggingface.co/datasets/ituperceptron/image-captioning-turkish
Explore at:
Dataset updated
Jun 7, 2025
Dataset authored and provided by
ITU Perceptron
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Türkçe Image Captioning Veri Seti

Bu veri seti BLIP3o modelinin pre-train eğitiminde kullanılan BLIP3o-Pretrain-Long-Caption ve BLIP3o-Pretrain-Short-Caption veri setlerinin Türkçeye çevirilmiş bir alt parçasıdır. Veri setinin oluşturulması ile ilgili detaylı bilgiye orijinal veri seti üzerinden ulaşabilirsiniz. Veri seti Image-to-Text modellerinin eğitilmesinde veya ince ayar sürecinde kullanılabilir. Veri seti, orijinal veri setinin lisansı olan Apache 2.0 altında paylaşılmıştır.… See the full description on the dataset page: https://huggingface.co/datasets/ituperceptron/image-captioning-turkish.
h
Corvus-OCR-Caption-Mix
huggingface.co
Updated Jul 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prithiv Sakthi (2025). Corvus-OCR-Caption-Mix [Dataset]. https://huggingface.co/datasets/prithivMLmods/Corvus-OCR-Caption-Mix
Explore at:
Dataset updated
Jul 13, 2025
Authors
Prithiv Sakthi
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Corvus-OCR-Caption-Mix

Corvus-OCR-Caption-Mix is a high-quality, compact image-caption dataset designed for training and evaluating image-to-text models. This collection is derived and optimized from the larger BLIP3o/BLIP3o-Pretrain-Long-Caption, with a focus on long-form captions and mixed OCR tasks across a variety of image types.

Dataset Summary

The dataset spans over 229,000 image-caption pairs and provides a balanced blend of:

OCR-rich documents featuring… See the full description on the dataset page: https://huggingface.co/datasets/prithivMLmods/Corvus-OCR-Caption-Mix.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

BLIP3o (2025). BLIP3o-Pretrain-Long-Caption [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption

BLIP3o-Pretrain-Long-Caption

BLIP3o/BLIP3o-Pretrain-Long-Caption

Explore at:

Dataset updated

May 17, 2025

Dataset authored and provided by

BLIP3o

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

BLIP3o Pretrain Long-Caption Dataset

This collection contains 27 million images, each paired with a long (~120 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.

  Download

from huggingface_hub import snapshot_download

snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Long-Caption", repo_type="dataset" )

  Load Dataset without Extracting

You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption.

Clear search

Close search

Google apps

Main menu

BLIP3o-Pretrain-Long-Caption

BLIP3o-Pretrain-JourneyDB

BLIP3o-60k

blip3o-caption-mini-arrow

Bagel-new-BLIP3o-5k

blip3o-pretrain-short-recaptioned

BLIP3o-60k

Caption3o-Opt

Caption3o-Opt-v2

Corvus-OCR-Caption-Mini-Mix

UniWorld-V1

image-captioning-turkish

Corvus-OCR-Caption-Mix

BLIP3o-Pretrain-Long-CaptionSee More Versions

BLIP3o/BLIP3o-Pretrain-Long-Caption

BLIP3o-Pretrain-Long-Caption