MMStar (Are We on the Right Way for Evaluating Large Vision-Language Models?)
🌐 Homepage | 🤗 Dataset | 🤗 Paper | 📖 arXiv | GitHub
Dataset Details
As shown in the figure below, existing benchmarks lack consideration of the vision dependency of evaluation samples and potential data leakage from LLMs' and LVLMs' training data.
Therefore, we introduce MMStar: an elite vision-indispensible multi-modal benchmark, aiming to ensure each curated sample exhibits… See the full description on the dataset page: https://huggingface.co/datasets/Lin-Chen/MMStar.
MMStar is an elite vision-indispensable multi-modal benchmark comprising 1,500 meticulously selected samples. These samples are carefully balanced and purified, ensuring they exhibit visual dependency, minimal data leakage, and require advanced multi-modal capabilities. MMStar evaluates LVLMs across 6 core capabilities and 18 detailed axes.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
K-MMStar
We introduce K-MMStar, a Korean adaptation of the MMStar [1] designed for evaluating vision-language models. By translating the val subset of MMStar into Korean and carefully reviewing its naturalness through human inspection, we developed a novel robust evaluation benchmark specifically for Korean language. (We observe that there are unanswerable cases (e.g., multiple images required to answer the question but only has a single image, vague questions or options) in the… See the full description on the dataset page: https://huggingface.co/datasets/NCSOFT/K-MMStar.
jpark677/mmstar dataset hosted on Hugging Face and contributed by the HF Datasets community
macabdul9/MMStar dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Credit report of Mm Star Pte Th contains unique and detailed export import market intelligence with it's phone, email, Linkedin and details of each import and export shipment like product, quantity, price, buyer, supplier names, country and date of shipment.
ko-vlm/K-MMStar dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the IDs of a subset of question used in the project: Where do Large Vision-Language Models Look at when Answering Questions? [paper] [code] It is a heatmap visualization method for interpreting Large Vision-Language Models (LVLMs) when generating open-ended answers. The original datasets can be obtained at CV-Bench, MMVP, MMStar. We sincerely appreciate the authors of these datasets for their contributions. This selected subseted is based on the relevance of the… See the full description on the dataset page: https://huggingface.co/datasets/xiaoying0505/LVLM_Interpretation.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
MMStar (Are We on the Right Way for Evaluating Large Vision-Language Models?)
🌐 Homepage | 🤗 Dataset | 🤗 Paper | 📖 arXiv | GitHub
Dataset Details
As shown in the figure below, existing benchmarks lack consideration of the vision dependency of evaluation samples and potential data leakage from LLMs' and LVLMs' training data.
Therefore, we introduce MMStar: an elite vision-indispensible multi-modal benchmark, aiming to ensure each curated sample exhibits… See the full description on the dataset page: https://huggingface.co/datasets/Lin-Chen/MMStar.