MathVista is a consolidated Mathematical reasoning benchmark within Visual contexts. It consists of three newly created datasets, IQTest, FunctionQA, and PaperQA, which address the missing visual domains and are tailored to evaluate logical reasoning on puzzle test figures, algebraic reasoning over functional plots, and scientific reasoning with academic paper figures, respectively. It also incorporates 9 MathQA datasets and 19 VQA datasets from the literature, which significantly enrich the diversity and complexity of visual perception and mathematical reasoning challenges within our benchmark. In total, MathVista includes 6,141 examples collected from 31 different datasets.
Project: https://mathvista.github.io/ Visualization: https://mathvista.github.io/#visualization Leaderboard: https://mathvista.github.io/#leaderboard Paper: https://arxiv.org/abs/2310.02255 Data: https://huggingface.co/datasets/AI4Math/MathVista Code: https://github.com/lupantech/MathVista
VIM-Bench/VIM-MathVista dataset hosted on Hugging Face and contributed by the HF Datasets community
macabdul9/MathVista dataset hosted on Hugging Face and contributed by the HF Datasets community
yuanshengni/MathVista-CoT-num10 dataset hosted on Hugging Face and contributed by the HF Datasets community
Comparison of Represents the average of math benchmarks in the Artificial Analysis Intelligence Index (AIME 2024 & Math-500) by Model
Comparison of Intelligence Index incorporates 7 evaluations spanning reasoning, knowledge, math & coding by Model
Comparison of Intelligence Index incorporates 7 evaluations spanning reasoning, knowledge, math & coding by Model
Not seeing a result you expected?
Learn how you can add new datasets to our index.
MathVista is a consolidated Mathematical reasoning benchmark within Visual contexts. It consists of three newly created datasets, IQTest, FunctionQA, and PaperQA, which address the missing visual domains and are tailored to evaluate logical reasoning on puzzle test figures, algebraic reasoning over functional plots, and scientific reasoning with academic paper figures, respectively. It also incorporates 9 MathQA datasets and 19 VQA datasets from the literature, which significantly enrich the diversity and complexity of visual perception and mathematical reasoning challenges within our benchmark. In total, MathVista includes 6,141 examples collected from 31 different datasets.
Project: https://mathvista.github.io/ Visualization: https://mathvista.github.io/#visualization Leaderboard: https://mathvista.github.io/#leaderboard Paper: https://arxiv.org/abs/2310.02255 Data: https://huggingface.co/datasets/AI4Math/MathVista Code: https://github.com/lupantech/MathVista