7 datasets found
  1. h

    JourneyDB

    • huggingface.co
    Updated Jun 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JourneyDB (2023). JourneyDB [Dataset]. https://huggingface.co/datasets/JourneyDB/JourneyDB
    Explore at:
    Dataset updated
    Jun 30, 2023
    Authors
    JourneyDB
    Description

    JourneyDB

    [Project Page] [Paper] [Code] [HuggingFace] [OpenDataLab]

      Dataset Description
    
    
    
    
    
    
    
      Summary
    

    JourneyDB is a large-scale generated image understanding dataset that contains 4,429,295 high-resolution Midjourney images, annotated with corresponding text prompt, image caption and visual question answering.

      Supported Tasks
    

    JourneyDB supports 4 downstream tasks, i.e. Prompt Inversion, Style Retrieval, Image Caption, and Visual Question… See the full description on the dataset page: https://huggingface.co/datasets/JourneyDB/JourneyDB.

  2. h

    JourneyDB

    • huggingface.co
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BitMind (2025). JourneyDB [Dataset]. https://huggingface.co/datasets/bitmind/JourneyDB
    Explore at:
    Dataset updated
    Mar 26, 2025
    Dataset authored and provided by
    BitMind
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for JourneyDB

    This is a mirror of the JourneyDB dataset.

  3. h

    BLIP3o-Pretrain-JourneyDB

    • huggingface.co
    Updated Aug 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BLIP3o (2025). BLIP3o-Pretrain-JourneyDB [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-JourneyDB
    Explore at:
    Dataset updated
    Aug 14, 2025
    Dataset authored and provided by
    BLIP3o
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    BLIP3o Pretrain JourneyDB Dataset

    This collection contains 4 million JourneyDB images.

      Download
    

    from huggingface_hub import snapshot_download

    snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-JourneyDB", repo_type="dataset" )

      Load Dataset without Extracting
    

    You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import load_dataset import glob

    data_files = glob.glob("/your/data/path/*.tar")… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-JourneyDB.

  4. h

    JourneyDB

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Lee, JourneyDB [Dataset]. https://huggingface.co/datasets/Salmonnn/JourneyDB
    Explore at:
    Authors
    Eric Lee
    Description

    Salmonnn/JourneyDB dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    ldt-latents

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shreenithi, ldt-latents [Dataset]. https://huggingface.co/datasets/shreenithi20/ldt-latents
    Explore at:
    Authors
    Shreenithi
    Description

    1 Million Image Latents Toy Dataset

    A lightweight toy dataset of 1 003 626 image latents paired with CLIP text embeddings.

      Raw sources & extraction
    

    LAION‑aesthetic (laion/laion2B-en-aesthetic):

    Streamed via 🤗 datasets in 50 k-image blocks. Filtered for aesthetic > 7. Skipped PNG/CMYK or images < 32×32 px.

    JourneyDB (MidJourney) (JourneyDB/JourneyDB):

    Downloaded three zip archives per batch from Hugging Face. Unzipped locally and selected the first 50 000 valid… See the full description on the dataset page: https://huggingface.co/datasets/shreenithi20/ldt-latents.

  6. h

    data_toy

    • huggingface.co
    Updated May 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PixArt (2024). data_toy [Dataset]. https://huggingface.co/datasets/PixArt-alpha/data_toy
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 4, 2024
    Dataset authored and provided by
    PixArt
    License

    https://choosealicense.com/licenses/openrail++/https://choosealicense.com/licenses/openrail++/

    Description

    Images are from JourneyDB

  7. h

    BLIP3o-60k

    • huggingface.co
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BLIP3o (2025). BLIP3o-60k [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-60k
    Explore at:
    Dataset updated
    May 13, 2025
    Dataset authored and provided by
    BLIP3o
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is BLIP3o-60k Text-to-Image instruction tuning dataset distilled from GPT-4o, including the following categories:

    JourneyDB Human (including MSCOCO with human caption, human gestures, occupations) Dalle3 Geneval (no overlap with test set) Common objects Simple text

    Here we provide the code guidance to download tar file: from huggingface_hub import snapshot_download snapshot_download(repo_id='BLIP3o/BLIP3o-60k', repo_type=‘dataset’)

    And you can use huggingface datasets to read the tar… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-60k.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
JourneyDB (2023). JourneyDB [Dataset]. https://huggingface.co/datasets/JourneyDB/JourneyDB

JourneyDB

JourneyDB/JourneyDB

Explore at:
94 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 30, 2023
Authors
JourneyDB
Description

JourneyDB

[Project Page] [Paper] [Code] [HuggingFace] [OpenDataLab]

  Dataset Description







  Summary

JourneyDB is a large-scale generated image understanding dataset that contains 4,429,295 high-resolution Midjourney images, annotated with corresponding text prompt, image caption and visual question answering.

  Supported Tasks

JourneyDB supports 4 downstream tasks, i.e. Prompt Inversion, Style Retrieval, Image Caption, and Visual Question… See the full description on the dataset page: https://huggingface.co/datasets/JourneyDB/JourneyDB.

Search
Clear search
Close search
Google apps
Main menu