2 datasets found

h
LLaVAR
huggingface.co
Updated Jan 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social And Language Technology Lab (2021). LLaVAR [Dataset]. https://huggingface.co/datasets/SALT-NLP/LLaVAR
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 27, 2021
Dataset authored and provided by
Social And Language Technology Lab
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
LLaVAR Data: Enhanced Visual Instruction Data with Text-Rich Images

More info at LLaVAR project page, Github repo, and paper.

Training Data

Based on the LAION dataset, we collect 422K pretraining data based on OCR results. For finetuning data, we collect 16K high-quality instruction-following data by interacting with langauge-only GPT-4. Note that we also release a larger and more diverse finetuning dataset below (20K), which contains the 16K we used for the paper. The… See the full description on the dataset page: https://huggingface.co/datasets/SALT-NLP/LLaVAR.
llavar
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sionic-ai, llavar [Dataset]. https://huggingface.co/datasets/sionic-ai/llavar
Explore at:
Dataset provided by
Sionic AI Inc
Authors
sionic-ai
Description
sionic-ai/llavar dataset hosted on Hugging Face and contributed by the HF Datasets community
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Social And Language Technology Lab (2021). LLaVAR [Dataset]. https://huggingface.co/datasets/SALT-NLP/LLaVAR

LLaVAR

SALT-NLP/LLaVAR

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 27, 2021

Dataset authored and provided by

Social And Language Technology Lab

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

LLaVAR Data: Enhanced Visual Instruction Data with Text-Rich Images

More info at LLaVAR project page, Github repo, and paper.

  Training Data

Based on the LAION dataset, we collect 422K pretraining data based on OCR results. For finetuning data, we collect 16K high-quality instruction-following data by interacting with langauge-only GPT-4. Note that we also release a larger and more diverse finetuning dataset below (20K), which contains the 16K we used for the paper. The… See the full description on the dataset page: https://huggingface.co/datasets/SALT-NLP/LLaVAR.

Clear search

Close search

Google apps

Main menu

LLaVAR

llavar

LLaVAR

SALT-NLP/LLaVAR