17 datasets found
  1. h

    laion-aesthetics-12m-umap

    • huggingface.co
    Updated Apr 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David McClure (2023). laion-aesthetics-12m-umap [Dataset]. https://huggingface.co/datasets/dclure/laion-aesthetics-12m-umap
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 7, 2023
    Authors
    David McClure
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    LAION-Aesthetics :: CLIP → UMAP

    This dataset is a CLIP (text) → UMAP embedding of the LAION-Aesthetics dataset - specifically the improved_aesthetics_6plus version, which filters the full dataset to images with scores of > 6 under the "aesthetic" filtering model. Thanks LAION for this amazing corpus!

    The dataset here includes coordinates for 3x separate UMAP fits using different values for the n_neighbors parameter - 10, 30, and 60 - which are broken out as separate columns with… See the full description on the dataset page: https://huggingface.co/datasets/dclure/laion-aesthetics-12m-umap.

  2. P

    LAION-Aesthetics V2 6.5+ Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LAION-Aesthetics V2 6.5+ Dataset [Dataset]. https://paperswithcode.com/dataset/laion-aesthetics-v2-6-5
    Explore at:
    Description

    A subset of the LAION 5B samples with English captions, obtained using LAION-Aesthetics_Predictor V2 625K image-text pairs with predicted aesthetics scores of 6.5 or higher available at https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6.5plus

  3. laion2B-en-aesthetic

    • huggingface.co
    Updated May 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LAION eV (2025). laion2B-en-aesthetic [Dataset]. http://doi.org/10.57967/hf/5792
    Explore at:
    Dataset updated
    May 27, 2025
    Dataset provided by
    LAIONhttps://laion.ai/
    Authors
    LAION eV
    Description

    laion/laion2B-en-aesthetic dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    laion-coco-aesthetic

    • huggingface.co
    Updated Feb 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guangyi Liu (2019). laion-coco-aesthetic [Dataset]. https://huggingface.co/datasets/guangyil/laion-coco-aesthetic
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 15, 2019
    Authors
    Guangyi Liu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    LAION COCO with aesthetic score and watermark score

    This dataset contains 10% samples of the LAION-COCO dataset filtered by some text rules (remove url, special tokens, etc.), and image rules (image size > 384x384, aesthetic score>4.75 and watermark probability<0.5). There are total 8,563,753 data instances in this dataset. And the corresponding aesthetic score and watermark score are also included. Noted: watermark score in the table means the probability of the existence of the… See the full description on the dataset page: https://huggingface.co/datasets/guangyil/laion-coco-aesthetic.

  5. LAION-Aesthetics 9

    • kaggle.com
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CookieMonsterYum (2023). LAION-Aesthetics 9 [Dataset]. https://www.kaggle.com/datasets/cookiemonsteryum/laion-aesthetics-9
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2023
    Dataset provided by
    Kaggle
    Authors
    CookieMonsterYum
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by CookieMonsterYum

    Released under CC0: Public Domain

    Contents

  6. t

    Xiang Gao, Zhengbo Xu, Junhan Zhao, Jiaying Liu (2024). Dataset:...

    • service.tib.eu
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Xiang Gao, Zhengbo Xu, Junhan Zhao, Jiaying Liu (2024). Dataset: LAION-Aesthetics 6.5+. https://doi.org/10.57702/zvbnqhl9 [Dataset]. https://service.tib.eu/ldmservice/dataset/laion-aesthetics-6-5-
    Explore at:
    Dataset updated
    Dec 2, 2024
    Description

    LAION-Aesthetics 6.5+ dataset contains 625K image-text pairs.

  7. t

    LAION-Aesthetic - Dataset - LDM

    • service.tib.eu
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). LAION-Aesthetic - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/laion-aesthetic
    Explore at:
    Dataset updated
    Dec 2, 2024
    Description

    The dataset used in the paper is LAION-Aesthetic, a large-scale image dataset.

  8. Ko-LAION-Aesthetics-10M

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ETRI VILAB(Visual Intelligence Lab), Ko-LAION-Aesthetics-10M [Dataset]. https://huggingface.co/datasets/etri-vilab/Ko-LAION-Aesthetics-10M
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    한국전자통신연구원https://www.etri.re.kr/
    Authors
    ETRI VILAB(Visual Intelligence Lab)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LAION-Aesthetics 10M Dataset Card

      Dataset details
    

    Dataset type: Laion aesthetic is a subset of laion5B that has been estimated by a model trained on top of clip embeddings to be aesthetic. The intended usage of this dataset is image generation Paper or resources for more information: https://laion.ai/blog/laion-aesthetics/ Acknowledgements This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grants funded by the… See the full description on the dataset page: https://huggingface.co/datasets/etri-vilab/Ko-LAION-Aesthetics-10M.

  9. Audio-aesthetics-score

    • huggingface.co
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LAION eV (2025). Audio-aesthetics-score [Dataset]. https://huggingface.co/datasets/laion/Audio-aesthetics-score
    Explore at:
    Dataset updated
    May 27, 2025
    Dataset provided by
    LAIONhttps://laion.ai/
    Authors
    LAION eV
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    laion/Audio-aesthetics-score dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    spright_coco

    • huggingface.co
    Updated Apr 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SPRIGHT (2024). spright_coco [Dataset]. https://huggingface.co/datasets/SPRIGHT-T2I/spright_coco
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 2, 2024
    Dataset authored and provided by
    SPRIGHT
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Description

    SPRIGHT (SPatially RIGHT) is the first spatially focused, large scale vision-language dataset. It was built by re-captioning ∼6 million images from 4 widely-used datasets:

    CC12M Segment Anything COCO Validation LAION Aesthetics

    This repository contains the re-captioned data from COCO-Validation Set, while the data from CC12 and Segment Anything is present here. We do not release images from LAION, as the parent images are currently private.

      Dataset… See the full description on the dataset page: https://huggingface.co/datasets/SPRIGHT-T2I/spright_coco.
    
  11. h

    18_obj_444

    • huggingface.co
    Updated Apr 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SPRIGHT (2024). 18_obj_444 [Dataset]. https://huggingface.co/datasets/SPRIGHT-T2I/18_obj_444
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 12, 2024
    Dataset authored and provided by
    SPRIGHT
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Description

    This dataset contains the 444 images that we used for training our model - https://huggingface.co/SPRIGHT-T2I/spright-t2i-sd2. This contains the samples of this subset related to the Segment Anything images. We will release the LAION images, when the parent images are made public again. Our training and validation set are a subset of the SPRIGHT dataset, and consists of 444 and 50 images respectively, randomly sampled in a 50:50 split between LAION-Aesthetics and… See the full description on the dataset page: https://huggingface.co/datasets/SPRIGHT-T2I/18_obj_444.

  12. h

    t2i-diversity-captions

    • huggingface.co
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Intelligence & Machine Learning Lab at TU Darmstadt (2025). t2i-diversity-captions [Dataset]. https://huggingface.co/datasets/AIML-TUDA/t2i-diversity-captions
    Explore at:
    Dataset updated
    Jun 23, 2025
    Dataset authored and provided by
    Artificial Intelligence & Machine Learning Lab at TU Darmstadt
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    📄 Synthetic Captioned LAION Subset

      🧾 Dataset Summary
    

    This dataset consists of over 39 Million synthetic image captions generated for 1 Million curated images from LAION Aesthetics. Images have an aesthetics score >6, at minimum resolution of 512p, and have been screened for NSFW, CSAM and watermarks. We also removed exact duplicates.

      📦 Data Structure
    

    strID - Unique string identifier for the sample intID - Unique integer identifier for the sample imageURL -… See the full description on the dataset page: https://huggingface.co/datasets/AIML-TUDA/t2i-diversity-captions.

  13. h

    MMFR-Dataset

    • huggingface.co
    Updated Jun 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AnnaGao (2025). MMFR-Dataset [Dataset]. https://huggingface.co/datasets/AnnaGao/MMFR-Dataset
    Explore at:
    Dataset updated
    Jun 21, 2025
    Authors
    AnnaGao
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    📦 Data Sources

    The training set of MMFR-Dataset contains:

    Fake images sourced from DiffusionDB, released under the CC0 1.0 Public Domain Dedication. Real images drawn from LAION-Aesthetics, a subset of LAION-5B, licensed under the CC BY 4.0 License.

    Licenses of evaluation sets are:

    BigGAN: Provided by the GenImage dataset, licensed under CC BY-NC-SA 4.0. GauGAN: Obtained from CNNDetection, released under the CC BY-NC-SA 4.0. StyleGAN-XL: Collected from AntifakePrompts, under the… See the full description on the dataset page: https://huggingface.co/datasets/AnnaGao/MMFR-Dataset.

  14. h

    nobodies

    • huggingface.co
    Updated Mar 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Plat (2023). nobodies [Dataset]. https://huggingface.co/datasets/p1atdev/nobodies
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 12, 2023
    Authors
    Plat
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Nobodies

    AI-generated human image dataset.

      Contents
    
    
    
    
    
      Face
    

    vol1: 32 photos of women's faces. Generated with WD1.5 beta 2.

    Sample:

      Portrait
    

    vol1: 31 photos of women's portraits. Generated with WD1.5 beta 2 and the fashion LoCon.

    Sample:

    vol2: 165 photos of woman's portraits. Generated with WD1.5 beta 2 and the fashion LoCon. Classified with LAION Aesthetic v2.75 hair bun photos 90 medium hair photos

  15. h

    LAION-art-EN-improved-captions

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Re:cast AI, LAION-art-EN-improved-captions [Dataset]. https://huggingface.co/datasets/recastai/LAION-art-EN-improved-captions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Re:cast AI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for LAION-art-EN-improved-captions

      Dataset Summary
    

    This dataset has been created by Re:cast AI for improving the semantic relationship of image-caption pairs. generated_captions were created in a semi-supervised fashion using the Salesforce/blip2-flan-t5-xxl model.

      Supported Tasks
    

    Fine-tuning text-to-image generators (e.g. stable-diffusion), or a searchable prompt database (requires faiss-index).

      Dataset Structure
    
    
    
    
    
      Data Fields… See the full description on the dataset page: https://huggingface.co/datasets/recastai/LAION-art-EN-improved-captions.
    
  16. h

    laion2b-23ish-1216px

    • huggingface.co
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open Diffusion AI, laion2b-23ish-1216px [Dataset]. https://huggingface.co/datasets/opendiffusionai/laion2b-23ish-1216px
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2025
    Dataset authored and provided by
    Open Diffusion AI
    Description

    Overview

    This is a subset of https://huggingface.co/datasets/laion/laion2B-en-aesthetic, selected for aspect ratio, and with better captioning. Approximate image count is around 250k.

      23ish, 1216px
    

    I picked out the images that are portrait aspect ratio of 2:3, or a little wider (Because images that are a little too wide, can be safely cropped narrower) I also picked a minimum height of 1216 pixels, because that is what 1024x1024 pixelcount converted to 2:3 looks like.… See the full description on the dataset page: https://huggingface.co/datasets/opendiffusionai/laion2b-23ish-1216px.

  17. h

    sd-4.4M

    • huggingface.co
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Javier Martín (2023). sd-4.4M [Dataset]. https://huggingface.co/datasets/jamarju/sd-4.4M
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2023
    Authors
    Javier Martín
    License

    https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/

    Description

    This is a dataset of 4.4M images generated with Stable Diffusion 2 for Kaggle's stable diffusion image to prompt competition. Prompts were extracted from public databases:

    mp: Magic Prompt - 1M db: DiffusionDB op: Open Prompts co: COCO cc: Conceptual Captions l0: LAION-2B-en-aesthetic

    The following prompts were filtered out:

    those with token length >77 CLIP tokens those whose all-MiniLM-L6-v2 embedding have a cosine similarity >0.9 to any other prompt

    Samples were clustered by their… See the full description on the dataset page: https://huggingface.co/datasets/jamarju/sd-4.4M.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
David McClure (2023). laion-aesthetics-12m-umap [Dataset]. https://huggingface.co/datasets/dclure/laion-aesthetics-12m-umap

laion-aesthetics-12m-umap

laion-aesthetics-12m-umap

dclure/laion-aesthetics-12m-umap

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 7, 2023
Authors
David McClure
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

LAION-Aesthetics :: CLIP → UMAP

This dataset is a CLIP (text) → UMAP embedding of the LAION-Aesthetics dataset - specifically the improved_aesthetics_6plus version, which filters the full dataset to images with scores of > 6 under the "aesthetic" filtering model. Thanks LAION for this amazing corpus!

The dataset here includes coordinates for 3x separate UMAP fits using different values for the n_neighbors parameter - 10, 30, and 60 - which are broken out as separate columns with… See the full description on the dataset page: https://huggingface.co/datasets/dclure/laion-aesthetics-12m-umap.

Search
Clear search
Close search
Google apps
Main menu