2 datasets found
  1. h

    BLIP3o-Pretrain-Long-Caption

    • huggingface.co
    Updated May 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BLIP3o (2025). BLIP3o-Pretrain-Long-Caption [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption
    Explore at:
    Dataset updated
    May 17, 2025
    Dataset authored and provided by
    BLIP3o
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    BLIP3o Pretrain Long-Caption Dataset

    This collection contains 27 million images, each paired with a long (~120 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.

      Download
    

    from huggingface_hub import snapshot_download

    snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Long-Caption", repo_type="dataset" )

      Load Dataset without Extracting
    

    You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption.

  2. h

    BLIP3o-Pretrain-Short-Caption

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BLIP3o, BLIP3o-Pretrain-Short-Caption [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Short-Caption
    Explore at:
    Dataset authored and provided by
    BLIP3o
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    BLIP3o Pretrain Short-Caption Dataset

    This collection contains 5 million images, each paired with a short (~20 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.

      Download
    

    from huggingface_hub import snapshot_download

    snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Short-Caption", repo_type="dataset" )

      Load Dataset without Extracting
    

    You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Short-Caption.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
BLIP3o (2025). BLIP3o-Pretrain-Long-Caption [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption

BLIP3o-Pretrain-Long-Caption

BLIP3o/BLIP3o-Pretrain-Long-Caption

Explore at:
Dataset updated
May 17, 2025
Dataset authored and provided by
BLIP3o
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

BLIP3o Pretrain Long-Caption Dataset

This collection contains 27 million images, each paired with a long (~120 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.

  Download

from huggingface_hub import snapshot_download

snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-Long-Caption", repo_type="dataset" )

  Load Dataset without Extracting

You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption.

Search
Clear search
Close search
Google apps
Main menu