MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for GSM8K
Dataset Summary
GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.
These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
gsm8k-ja-test_250-1319
This dataset contains 1069 Japanese math problems and their solutions. It was used for optimizing LLMs in the paper "Evolutionary Optimization of Model Merging Recipes".
Dataset Details
This dataset contains Japanese translations of 1069 math problems and solutions from the GSM8K test set, starting from the 251st example out of 1319. The translation was done using gpt-4-0125-preview. We did not use the first 250 examples because they are… See the full description on the dataset page: https://huggingface.co/datasets/SakanaAI/gsm8k-ja-test_250-1319.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is the data pool synthesized from the query set of the GSM8K training set, containing all answer-correct samples and other metadata produced during the work. DART-Math-* datasets are extracted from dart-math-pool-* data pools.
🎯 DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
📝 Paper@arXiv | 🤗 Datasets&Models@HF | 🐱 Code@GitHub 🐦 Thread@X(Twitter) | 🐶 中文博客@知乎 | 📊 Leaderboard@PapersWithCode | 📑 BibTeX
Datasets:… See the full description on the dataset page: https://huggingface.co/datasets/hkust-nlp/dart-math-pool-gsm8k.
This dataset was derived automatically using self-verification during the model finetuning process. With Llama3.1-8b-instruct as base and the gsm8k-prolog train set, we finetuned the model and decoded on the gsm8k test set. The final model achieved 95% accuracy using this automated self-training process, i.e. no human intervention. We allowed at most two different solutions per question in the original gsm8k test set to boost diversity. See this paper for further detail… See the full description on the dataset page: https://huggingface.co/datasets/wilsontam/gsm8k-prolog-test.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Multilingual Grade School Math Benchmark (MGSM) is a benchmark of grade-school math problems, proposed in the paper Language models are multilingual chain-of-thought reasoners.
The same 250 problems from GSM8K are each translated via human annotators in 10 languages. The 10 languages are: - Spanish - French - German - Russian - Chinese - Japanese - Thai - Swahili - Bengali - Telugu
You can find the input and targets for each of the ten languages (and English) as .tsv
files.
We also include few-shot exemplars that are also manually translated from each language in exemplars.py
.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
website | paper
AceMath-RewardBench Evaluation Dataset Card
The AceMath-RewardBench evaluation dataset evaluates capabilities of a math reward model using the best-of-N (N=8) setting for 7 datasets:
GSM8K: 1319 questions Math500: 500 questions Minerva Math: 272 questions Gaokao 2023 en: 385 questions OlympiadBench: 675 questions College Math: 2818 questions MMLU STEM: 3018 questions
Each example in the dataset contains:
A mathematical question 64 solution attempts with varying… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/AceMath-RewardBench.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for GSM8K
Dataset Summary
GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.
These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.