Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Geneval-style dataset is sourced from BLIP3o-60k.
This dataset is presented in the paper: UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation More details can be found in UniWorld-V1
Data preparation
Download the data from LanguageBind/UniWorld-V1. The dataset consists of two parts: source images and annotation JSON files. Prepare a data.txt file in the following format:
The first column is the root path to the image.
The second… See the full description on the dataset page: https://huggingface.co/datasets/LanguageBind/UniWorld-V1.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
LanguageBind/Video-Bench dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models If you like our project, please give us a star ⭐ on GitHub for latest update.
📰 News
[2024.01.30] The paper is released. [2024.01.27] 🤗Hugging Face demo and all codes & datasets are available now! Welcome to watch 👀 this repository for the latest updates.
😮 Highlights
MoE-LLaVA shows excellent performance in multi-modal learning.
🔥 High performance, but with fewer… See the full description on the dataset page: https://huggingface.co/datasets/LanguageBind/MoE-LLaVA.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
【ICLR 2024 🔥】LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment If you like our project, please give us a star ⭐ on GitHub for latest update.
📰 News
[2024.01.27] 👀👀👀 Our MoE-LLaVA is released! A sparse model with 3B parameters outperformed the dense model with 7B parameters. [2024.01.16] 🔥🔥🔥 Our LanguageBind has been accepted at ICLR 2024! We earn the score of 6(3)8(6)6(6)6(6) here. [2023.12.15] 💪💪💪 We… See the full description on the dataset page: https://huggingface.co/datasets/LanguageBind/VIDAL-Depth-Thermal.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
LanguageBind/Cambrian737k dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterLanguageBind/StyleVideoDataSet dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterLanguageBind/LLMBind dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
TinyLLaVA-Video
This dataset combines data from multiple sources for pre-training and fine-tuning. Pretrain Data: Four subsets of LLaVA-Video-178K (0_30_s_academic_v0_1, 30_60_s_academic_v0_1, 0_30_s_youtube_v0_1, 30_60_s_youtube_v0_1), supplemented with filtered Video-LLaVA data (https://huggingface.co/datasets/LanguageBind/Video-LLaVA) and data from Valley (https://github.com/RupertLuo/Valley). The video data can be downloaded from the linked datasets, and cleaned annotations are provided… See the full description on the dataset page: https://huggingface.co/datasets/Zhang199/TinyLLaVA-Video-v1-training-data.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Geneval-style dataset is sourced from BLIP3o-60k.
This dataset is presented in the paper: UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation More details can be found in UniWorld-V1
Data preparation
Download the data from LanguageBind/UniWorld-V1. The dataset consists of two parts: source images and annotation JSON files. Prepare a data.txt file in the following format:
The first column is the root path to the image.
The second… See the full description on the dataset page: https://huggingface.co/datasets/LanguageBind/UniWorld-V1.