JourneyDB
[Project Page] [Paper] [Code] [HuggingFace] [OpenDataLab]
Dataset Description
Summary
JourneyDB is a large-scale generated image understanding dataset that contains 4,429,295 high-resolution Midjourney images, annotated with corresponding text prompt, image caption and visual question answering.
Supported Tasks
JourneyDB supports 4 downstream tasks, i.e. Prompt Inversion, Style Retrieval, Image Caption, and Visual Question… See the full description on the dataset page: https://huggingface.co/datasets/JourneyDB/JourneyDB.
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Dataset Card for JourneyDB
This is a mirror of the JourneyDB dataset.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
BLIP3o Pretrain JourneyDB Dataset
This collection contains 4 million JourneyDB images.
Download
from huggingface_hub import snapshot_download
snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-JourneyDB", repo_type="dataset" )
Load Dataset without Extracting
You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import load_dataset import glob
data_files = glob.glob("/your/data/path/*.tar")… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-JourneyDB.
1 Million Image Latents Toy Dataset
A lightweight toy dataset of 1 003 626 image latents paired with CLIP text embeddings.
Raw sources & extraction
LAION‑aesthetic (laion/laion2B-en-aesthetic):
Streamed via 🤗 datasets in 50 k-image blocks. Filtered for aesthetic > 7. Skipped PNG/CMYK or images < 32×32 px.
JourneyDB (MidJourney) (JourneyDB/JourneyDB):
Downloaded three zip archives per batch from Hugging Face. Unzipped locally and selected the first 50 000 valid… See the full description on the dataset page: https://huggingface.co/datasets/shreenithi20/ldt-latents.
https://choosealicense.com/licenses/openrail++/https://choosealicense.com/licenses/openrail++/
Images are from JourneyDB
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is BLIP3o-60k Text-to-Image instruction tuning dataset distilled from GPT-4o, including the following categories:
JourneyDB Human (including MSCOCO with human caption, human gestures, occupations) Dalle3 Geneval (no overlap with test set) Common objects Simple text
Here we provide the code guidance to download tar file: from huggingface_hub import snapshot_download snapshot_download(repo_id='BLIP3o/BLIP3o-60k', repo_type=‘dataset’)
And you can use huggingface datasets to read the tar… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-60k.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
JourneyDB
[Project Page] [Paper] [Code] [HuggingFace] [OpenDataLab]
Dataset Description
Summary
JourneyDB is a large-scale generated image understanding dataset that contains 4,429,295 high-resolution Midjourney images, annotated with corresponding text prompt, image caption and visual question answering.
Supported Tasks
JourneyDB supports 4 downstream tasks, i.e. Prompt Inversion, Style Retrieval, Image Caption, and Visual Question… See the full description on the dataset page: https://huggingface.co/datasets/JourneyDB/JourneyDB.