https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
Citation
If you find our work helpful, feel free to give us a cite. @article{zhuang2024math, title={Math-puma: Progressive upward multimodal alignment to enhance mathematical reasoning}, author={Zhuang, Wenwen and Huang, Xin and Zhang, Xiantao and Zeng, Jin}, journal={arXiv preprint arXiv:2408.08640}, year={2024} }
GSM8K is a dataset of 8.5K high quality linguistically diverse grade school math word problems created by human problem writers. The dataset is segmented into 7.5K training problems and 1K test problems. These problems take between 2 and 8 steps to solve, and solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the final answer. A bright middle school student should be able to solve every problem. It can be used for multi-step mathematical reasoning.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for GSM8K
Dataset Summary
GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.
These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparative experiments of multimodal sentiment analysis models on the dataset CMU-MOSEI.
Current visual question answering (VQA) tasks mainly consider answering human-annotated questions for natural images in the daily-life context. Icon question answering (IconQA) is a benchmark which aims to highlight the importance of abstract diagram understanding and comprehensive cognitive reasoning in real-world diagram word problems. For this benchmark, a large-scale IconQA dataset is built that consists of three sub-tasks: multi-image-choice, multi-text-choice, and filling-in-the-blank. Compared to existing VQA benchmarks, IconQA requires not only perception skills like object recognition and text understanding, but also diverse cognitive reasoning skills, such as geometric reasoning, commonsense reasoning, and arithmetic reasoning.
Description from: IconQA
MMLU (Massive Multitask Language Understanding) is a new benchmark designed to measure knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings. This makes the benchmark more challenging and more similar to how we evaluate humans. The benchmark covers 57 subjects across STEM, the humanities, the social sciences, and more. It ranges in difficulty from an elementary level to an advanced professional level, and it tests both world knowledge and problem solving ability. Subjects range from traditional areas, such as mathematics and history, to more specialized areas like law and ethics. The granularity and breadth of the subjects makes the benchmark ideal for identifying a model’s blind spots.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CMU-MOSI dataset information.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
R1-Onevision
[📂 GitHub][📝 Paper] [🤗 Reasoning Benchmark] [🤗 HF Demo]
R1-Onevision Dataset
Dataset Overview
The R1-Onevision dataset is a meticulously crafted resource designed to empower models with advanced multimodal reasoning capabilities. Aimed at bridging the gap between visual and textual understanding, this dataset provides rich, context-aware reasoning tasks across diverse domains, including natural scenes, science, mathematical problems… See the full description on the dataset page: https://huggingface.co/datasets/Fancy-MLLM/R1-Onevision.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
Citation
If you find our work helpful, feel free to give us a cite. @article{zhuang2024math, title={Math-puma: Progressive upward multimodal alignment to enhance mathematical reasoning}, author={Zhuang, Wenwen and Huang, Xin and Zhang, Xiantao and Zeng, Jin}, journal={arXiv preprint arXiv:2408.08640}, year={2024} }