7 datasets found

h
JourneyDB
huggingface.co
Updated Jun 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JourneyDB (2023). JourneyDB [Dataset]. https://huggingface.co/datasets/JourneyDB/JourneyDB
Explore at:
Dataset updated
Jun 30, 2023
Authors
JourneyDB
Description
JourneyDB

[Project Page] [Paper] [Code] [HuggingFace] [OpenDataLab]

Dataset Description Summary

JourneyDB is a large-scale generated image understanding dataset that contains 4,429,295 high-resolution Midjourney images, annotated with corresponding text prompt, image caption and visual question answering.

Supported Tasks

JourneyDB supports 4 downstream tasks, i.e. Prompt Inversion, Style Retrieval, Image Caption, and Visual Question… See the full description on the dataset page: https://huggingface.co/datasets/JourneyDB/JourneyDB.
h
JourneyDB
huggingface.co
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BitMind (2025). JourneyDB [Dataset]. https://huggingface.co/datasets/bitmind/JourneyDB
Explore at:
Dataset updated
Mar 26, 2025
Dataset authored and provided by
BitMind
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Dataset Card for JourneyDB

This is a mirror of the JourneyDB dataset.
h
BLIP3o-Pretrain-JourneyDB
huggingface.co
Updated Aug 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BLIP3o (2025). BLIP3o-Pretrain-JourneyDB [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-JourneyDB
Explore at:
Dataset updated
Aug 14, 2025
Dataset authored and provided by
BLIP3o
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
BLIP3o Pretrain JourneyDB Dataset

This collection contains 4 million JourneyDB images.

Download

from huggingface_hub import snapshot_download

snapshot_download( repo_id="BLIP3o/BLIP3o-Pretrain-JourneyDB", repo_type="dataset" )

Load Dataset without Extracting

You don’t need to unpack the .tar archives, use WebDataset support in 🤗datasets instead: from datasets import load_dataset import glob

data_files = glob.glob("/your/data/path/*.tar")… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-JourneyDB.
h
JourneyDB
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric Lee, JourneyDB [Dataset]. https://huggingface.co/datasets/Salmonnn/JourneyDB
Explore at:
Authors
Eric Lee
Description
Salmonnn/JourneyDB dataset hosted on Hugging Face and contributed by the HF Datasets community
h
ldt-latents
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shreenithi, ldt-latents [Dataset]. https://huggingface.co/datasets/shreenithi20/ldt-latents
Explore at:
Authors
Shreenithi
Description
1 Million Image Latents Toy Dataset

A lightweight toy dataset of 1 003 626 image latents paired with CLIP text embeddings.

Raw sources & extraction

LAION‑aesthetic (laion/laion2B-en-aesthetic):

Streamed via 🤗 datasets in 50 k-image blocks. Filtered for aesthetic > 7. Skipped PNG/CMYK or images < 32×32 px.

JourneyDB (MidJourney) (JourneyDB/JourneyDB):

Downloaded three zip archives per batch from Hugging Face. Unzipped locally and selected the first 50 000 valid… See the full description on the dataset page: https://huggingface.co/datasets/shreenithi20/ldt-latents.
h
data_toy
huggingface.co
Updated May 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PixArt (2024). data_toy [Dataset]. https://huggingface.co/datasets/PixArt-alpha/data_toy
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 4, 2024
Dataset authored and provided by
PixArt
License
https://choosealicense.com/licenses/openrail++/https://choosealicense.com/licenses/openrail++/
Description
Images are from JourneyDB
h
BLIP3o-60k
huggingface.co
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BLIP3o (2025). BLIP3o-60k [Dataset]. https://huggingface.co/datasets/BLIP3o/BLIP3o-60k
Explore at:
Dataset updated
May 13, 2025
Dataset authored and provided by
BLIP3o
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This is BLIP3o-60k Text-to-Image instruction tuning dataset distilled from GPT-4o, including the following categories:

JourneyDB Human (including MSCOCO with human caption, human gestures, occupations) Dalle3 Geneval (no overlap with test set) Common objects Simple text

Here we provide the code guidance to download tar file: from huggingface_hub import snapshot_download snapshot_download(repo_id='BLIP3o/BLIP3o-60k', repo_type=‘dataset’)

And you can use huggingface datasets to read the tar… See the full description on the dataset page: https://huggingface.co/datasets/BLIP3o/BLIP3o-60k.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

JourneyDB (2023). JourneyDB [Dataset]. https://huggingface.co/datasets/JourneyDB/JourneyDB

JourneyDB

JourneyDB/JourneyDB

Explore at:

94 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jun 30, 2023

Authors

JourneyDB

Description

JourneyDB

[Project Page] [Paper] [Code] [HuggingFace] [OpenDataLab]

  Dataset Description







  Summary

JourneyDB is a large-scale generated image understanding dataset that contains 4,429,295 high-resolution Midjourney images, annotated with corresponding text prompt, image caption and visual question answering.

  Supported Tasks

JourneyDB supports 4 downstream tasks, i.e. Prompt Inversion, Style Retrieval, Image Caption, and Visual Question… See the full description on the dataset page: https://huggingface.co/datasets/JourneyDB/JourneyDB.

Clear search

Close search

Google apps

Main menu

JourneyDB

JourneyDB

BLIP3o-Pretrain-JourneyDB

JourneyDB

ldt-latents

data_toy

BLIP3o-60k

JourneyDB

JourneyDB/JourneyDB