2 datasets found
  1. h

    LLaVAR

    • huggingface.co
    Updated Jan 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social And Language Technology Lab (2021). LLaVAR [Dataset]. https://huggingface.co/datasets/SALT-NLP/LLaVAR
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 27, 2021
    Dataset authored and provided by
    Social And Language Technology Lab
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    LLaVAR Data: Enhanced Visual Instruction Data with Text-Rich Images

    More info at LLaVAR project page, Github repo, and paper.

      Training Data
    

    Based on the LAION dataset, we collect 422K pretraining data based on OCR results. For finetuning data, we collect 16K high-quality instruction-following data by interacting with langauge-only GPT-4. Note that we also release a larger and more diverse finetuning dataset below (20K), which contains the 16K we used for the paper. The… See the full description on the dataset page: https://huggingface.co/datasets/SALT-NLP/LLaVAR.

  2. llavar

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sionic-ai, llavar [Dataset]. https://huggingface.co/datasets/sionic-ai/llavar
    Explore at:
    Dataset provided by
    Sionic AI Inc
    Authors
    sionic-ai
    Description

    sionic-ai/llavar dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Social And Language Technology Lab (2021). LLaVAR [Dataset]. https://huggingface.co/datasets/SALT-NLP/LLaVAR

LLaVAR

SALT-NLP/LLaVAR

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 27, 2021
Dataset authored and provided by
Social And Language Technology Lab
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

LLaVAR Data: Enhanced Visual Instruction Data with Text-Rich Images

More info at LLaVAR project page, Github repo, and paper.

  Training Data

Based on the LAION dataset, we collect 422K pretraining data based on OCR results. For finetuning data, we collect 16K high-quality instruction-following data by interacting with langauge-only GPT-4. Note that we also release a larger and more diverse finetuning dataset below (20K), which contains the 16K we used for the paper. The… See the full description on the dataset page: https://huggingface.co/datasets/SALT-NLP/LLaVAR.

Search
Clear search
Close search
Google apps
Main menu