13 datasets found
  1. h

    BLIP3o-Pretrain-Long-Caption

    • huggingface.co
    Updated May 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BLIP3o (2025). BLIP3o-Pretrain-Long-Caption [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption
    Explore at:
    Dataset updated
    May 17, 2025
    Dataset authored and provided by
    BLIP3o
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    BLIP3o Pretrain Long-Caption Dataset

    This collection contains 27 million images, each paired with a long (~120 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.

      Download
    

    from huggingface_hub import snapshot_download

    snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Long-Caption", repo_type="dataset" )

      Load Dataset without Extracting
    

    You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption.

  2. h

    BLIP3o-Pretrain-JourneyDB

    • huggingface.co
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BLIP3o (2025). BLIP3o-Pretrain-JourneyDB [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-JourneyDB
    Explore at:
    Dataset updated
    May 27, 2025
    Dataset authored and provided by
    BLIP3o
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    BLIP3o Pretrain JourneyDB Dataset

    This collection contains 4 million JourneyDB images.

      Download
    

    from huggingface_hub import snapshot_download

    snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-JourneyDB", repo_type="dataset" )

      Load Dataset without Extracting
    

    You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import load_dataset import glob

    data_files = glob.glob("/your/data/path/*.tar")… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-JourneyDB.

  3. h

    BLIP3o-60k

    • huggingface.co
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BLIP3o (2025). BLIP3o-60k [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-60k
    Explore at:
    Dataset updated
    May 13, 2025
    Dataset authored and provided by
    BLIP3o
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is BLIP3o-60k Text-to-Image instruction tuning dataset distilled from GPT-4o, including the following categories:

    JourneyDB Human (including MSCOCO with human caption, human gestures, occupations) Dalle3 Geneval (no overlap with test set) Common objects Simple text

    Here we provide the code guidance to download tar file: from huggingface_hub import snapshot_download snapshot_download(repo_id='BLIP3o/BLIP3o-60k', repo_type=‘dataset’)

    And you can use huggingface datasets to read the tar… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-60k.

  4. h

    blip3o-caption-mini-arrow

    • huggingface.co
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prithiv Sakthi (2025). blip3o-caption-mini-arrow [Dataset]. https://huggingface.co/datasets/prithivMLmods/blip3o-caption-mini-arrow
    Explore at:
    Dataset updated
    Jun 27, 2025
    Authors
    Prithiv Sakthi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    blip3o-caption-mini-arrow

    blip3o-caption-mini-arrow is a high-quality, curated image-caption dataset derived and optimized from the original BLIP3o/BLIP3o-Pretrain-Long-Caption. This dataset is specifically filtered and processed for tasks involving long-form image captioning and vision-language understanding.

      Overview
    

    Total Samples: 91,600 Modality: Image ↔ Text Format: Arrow (auto-converted to Parquet) License: Apache 2.0 Language: English Size: ~4.5 GB… See the full description on the dataset page: https://huggingface.co/datasets/prithivMLmods/blip3o-caption-mini-arrow.

  5. h

    Bagel-new-BLIP3o-5k

    • huggingface.co
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jun He (2025). Bagel-new-BLIP3o-5k [Dataset]. https://huggingface.co/datasets/redshallot/Bagel-new-BLIP3o-5k
    Explore at:
    Dataset updated
    Jul 1, 2025
    Authors
    Jun He
    Description

    redshallot/Bagel-new-BLIP3o-5k dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    blip3o-pretrain-short-recaptioned

    • huggingface.co
    Updated Jun 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziteng Gao (2025). blip3o-pretrain-short-recaptioned [Dataset]. https://huggingface.co/datasets/sebgao/blip3o-pretrain-short-recaptioned
    Explore at:
    Dataset updated
    Jun 21, 2025
    Authors
    Ziteng Gao
    Description

    sebgao/blip3o-pretrain-short-recaptioned dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    BLIP3o-60k

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Size Wu (2025). BLIP3o-60k [Dataset]. https://huggingface.co/datasets/wusize/BLIP3o-60k
    Explore at:
    Dataset updated
    Jun 1, 2025
    Authors
    Size Wu
    Description

    wusize/BLIP3o-60k dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    Caption3o-Opt

    • huggingface.co
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prithiv Sakthi (2025). Caption3o-Opt [Dataset]. https://huggingface.co/datasets/prithivMLmods/Caption3o-Opt
    Explore at:
    Dataset updated
    Jul 2, 2025
    Authors
    Prithiv Sakthi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Caption3o-Opt

    Caption3o-Opt is a compact, high-quality image-caption dataset derived from the original BLIP3o/BLIP3o-Pretrain-Long-Caption. This refined subset focuses on optimized long-form captioning, curated for real-world and artistic image understanding across vision-language models.

      Overview
    

    Total Samples: 10,278 Modality: Image ↔ Text Format: Arrow (auto-converted to Parquet) License: Apache 2.0 Language: English Size: ~500 MB

      Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/prithivMLmods/Caption3o-Opt.
    
  9. h

    Caption3o-Opt-v2

    • huggingface.co
    Updated Jul 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prithiv Sakthi (2025). Caption3o-Opt-v2 [Dataset]. https://huggingface.co/datasets/prithivMLmods/Caption3o-Opt-v2
    Explore at:
    Dataset updated
    Jul 13, 2025
    Authors
    Prithiv Sakthi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Caption3o-Opt-v2

    Caption3o-Opt-v2 is a high-quality, compact image-caption dataset designed for training and evaluating image-to-text models. Derived from the larger BLIP3o/BLIP3o-Pretrain-Long-Caption, this optimized subset emphasizes long-form captions and covers a wide range of real-world and artistic scenes.

      Dataset Summary
    

    Size: 10,277 image-caption pairs Format: Parquet Image resolution: 512x512 Languages: English Modality: Image-to-Text License: Apache-2.0… See the full description on the dataset page: https://huggingface.co/datasets/prithivMLmods/Caption3o-Opt-v2.

  10. h

    Corvus-OCR-Caption-Mini-Mix

    • huggingface.co
    Updated Jul 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    Dataset updated
    Jul 12, 2025
    Authors
    Prithiv Sakthi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Corvus-OCR-Caption-Mini-Mix

    Corvus-OCR-Caption-Mini-Mix is a high-quality, compact image-caption dataset designed for training and evaluating image-to-text models. It is a carefully curated subset of the larger BLIP3o/BLIP3o-Pretrain-Long-Caption, optimized for mixed OCR and long-form captioning tasks.

      Dataset Summary
    

    This dataset contains a balanced mix of:

    Long-form natural language captions OCR-heavy samples with scientific, mathematical, and document-style… See the full description on the dataset page: https://huggingface.co/datasets/prithivMLmods/Corvus-OCR-Caption-Mini-Mix.

  11. h

    UniWorld-V1

    • huggingface.co
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    linbin (2025). UniWorld-V1 [Dataset]. https://huggingface.co/datasets/LanguageBind/UniWorld-V1
    Explore at:
    Dataset updated
    Jun 13, 2025
    Authors
    linbin
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The Geneval-style dataset is sourced from BLIP3o-60k.

    This dataset is presented in the paper: UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation More details can be found in UniWorld-V1

      Data preparation
    

    Download the data from LanguageBind/UniWorld-V1. The dataset consists of two parts: source images and annotation JSON files. Prepare a data.txt file in the following format:

    The first column is the root path to the image.

    The second… See the full description on the dataset page: https://huggingface.co/datasets/LanguageBind/UniWorld-V1.

  12. h

    image-captioning-turkish

    • huggingface.co
    Updated Jun 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ITU Perceptron (2025). image-captioning-turkish [Dataset]. https://huggingface.co/datasets/ituperceptron/image-captioning-turkish
    Explore at:
    Dataset updated
    Jun 7, 2025
    Dataset authored and provided by
    ITU Perceptron
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Türkçe Image Captioning Veri Seti

    Bu veri seti BLIP3o modelinin pre-train eğitiminde kullanılan BLIP3o-Pretrain-Long-Caption ve BLIP3o-Pretrain-Short-Caption veri setlerinin Türkçeye çevirilmiş bir alt parçasıdır. Veri setinin oluşturulması ile ilgili detaylı bilgiye orijinal veri seti üzerinden ulaşabilirsiniz. Veri seti Image-to-Text modellerinin eğitilmesinde veya ince ayar sürecinde kullanılabilir. Veri seti, orijinal veri setinin lisansı olan Apache 2.0 altında paylaşılmıştır.… See the full description on the dataset page: https://huggingface.co/datasets/ituperceptron/image-captioning-turkish.

  13. h

    Corvus-OCR-Caption-Mix

    • huggingface.co
    Updated Jul 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prithiv Sakthi (2025). Corvus-OCR-Caption-Mix [Dataset]. https://huggingface.co/datasets/prithivMLmods/Corvus-OCR-Caption-Mix
    Explore at:
    Dataset updated
    Jul 13, 2025
    Authors
    Prithiv Sakthi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Corvus-OCR-Caption-Mix

    Corvus-OCR-Caption-Mix is a high-quality, compact image-caption dataset designed for training and evaluating image-to-text models. This collection is derived and optimized from the larger BLIP3o/BLIP3o-Pretrain-Long-Caption, with a focus on long-form captions and mixed OCR tasks across a variety of image types.

      Dataset Summary
    

    The dataset spans over 229,000 image-caption pairs and provides a balanced blend of:

    OCR-rich documents featuring… See the full description on the dataset page: https://huggingface.co/datasets/prithivMLmods/Corvus-OCR-Caption-Mix.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
BLIP3o (2025). BLIP3o-Pretrain-Long-Caption [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption

BLIP3o-Pretrain-Long-Caption

BLIP3o/BLIP3o-Pretrain-Long-Caption

Explore at:
Dataset updated
May 17, 2025
Dataset authored and provided by
BLIP3o
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

BLIP3o Pretrain Long-Caption Dataset

This collection contains 27 million images, each paired with a long (~120 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.

  Download

from huggingface_hub import snapshot_download

snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Long-Caption", repo_type="dataset" )

  Load Dataset without Extracting

You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption.

Search
Clear search
Close search
Google apps
Main menu