6 datasets found

h
gsm8k
huggingface.co
Updated Aug 11, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenAI (2022). gsm8k [Dataset]. https://huggingface.co/datasets/openai/gsm8k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 11, 2022
Dataset authored and provided by
OpenAIhttp://openai.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for GSM8K

Dataset Summary

GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.

These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.
gsm8k-ja-test_250-1319
huggingface.co
Updated May 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
gsm8k-ja-test_250-1319 [Dataset]. https://huggingface.co/datasets/SakanaAI/gsm8k-ja-test_250-1319
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 19, 2024
Dataset authored and provided by
Sakana AIhttps://sakana.ai/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
gsm8k-ja-test_250-1319

This dataset contains 1069 Japanese math problems and their solutions. It was used for optimizing LLMs in the paper "Evolutionary Optimization of Model Merging Recipes".

Dataset Details

This dataset contains Japanese translations of 1069 math problems and solutions from the GSM8K test set, starting from the 251st example out of 1319. The translation was done using gpt-4-0125-preview. We did not use the first 250 examples because they are… See the full description on the dataset page: https://huggingface.co/datasets/SakanaAI/gsm8k-ja-test_250-1319.
h
dart-math-pool-gsm8k
huggingface.co
Updated Feb 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HKUST NLP Group (2025). dart-math-pool-gsm8k [Dataset]. https://huggingface.co/datasets/hkust-nlp/dart-math-pool-gsm8k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 19, 2025
Dataset authored and provided by
HKUST NLP Group
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset is the data pool synthesized from the query set of the GSM8K training set, containing all answer-correct samples and other metadata produced during the work. DART-Math-* datasets are extracted from dart-math-pool-* data pools.

🎯 DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving

📝 Paper@arXiv | 🤗 Datasets&Models@HF | 🐱 Code@GitHub 🐦 Thread@X(Twitter) | 🐶 中文博客@知乎 | 📊 Leaderboard@PapersWithCode | 📑 BibTeX

Datasets:… See the full description on the dataset page: https://huggingface.co/datasets/hkust-nlp/dart-math-pool-gsm8k.
h
gsm8k-prolog-test
huggingface.co
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
gsm8k-prolog-test [Dataset]. https://huggingface.co/datasets/wilsontam/gsm8k-prolog-test
Explore at:
Dataset updated
Mar 27, 2025
Authors
Wilson Tam
Description
This dataset was derived automatically using self-verification during the model finetuning process. With Llama3.1-8b-instruct as base and the gsm8k-prolog train set, we finetuned the model and decoded on the gsm8k test set. The final model achieved 95% accuracy using this automated self-training process, i.e. no human intervention. We allowed at most two different solutions per question in the original gsm8k test set to boost diversity. See this paper for further detail… See the full description on the dataset page: https://huggingface.co/datasets/wilsontam/gsm8k-prolog-test.
h
Data from: mgsm
huggingface.co
Updated Jun 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mgsm [Dataset]. https://huggingface.co/datasets/juletxara/mgsm
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 12, 2024
Authors
Julen Etxaniz
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Multilingual Grade School Math Benchmark (MGSM) is a benchmark of grade-school math problems, proposed in the paper Language models are multilingual chain-of-thought reasoners.

The same 250 problems from GSM8K are each translated via human annotators in 10 languages. The 10 languages are: - Spanish - French - German - Russian - Chinese - Japanese - Thai - Swahili - Bengali - Telugu

You can find the input and targets for each of the ten languages (and English) as .tsv files. We also include few-shot exemplars that are also manually translated from each language in exemplars.py.
AceMath-RewardBench
huggingface.co
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2025). AceMath-RewardBench [Dataset]. https://huggingface.co/datasets/nvidia/AceMath-RewardBench
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 18, 2025
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
website | paper

AceMath-RewardBench Evaluation Dataset Card

The AceMath-RewardBench evaluation dataset evaluates capabilities of a math reward model using the best-of-N (N=8) setting for 7 datasets:

GSM8K: 1319 questions Math500: 500 questions Minerva Math: 272 questions Gaokao 2023 en: 385 questions OlympiadBench: 675 questions College Math: 2818 questions MMLU STEM: 3018 questions

Each example in the dataset contains:

A mathematical question 64 solution attempts with varying… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/AceMath-RewardBench.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

OpenAI (2022). gsm8k [Dataset]. https://huggingface.co/datasets/openai/gsm8k

gsm8k

openai/gsm8k

Grade School Math 8K

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 11, 2022

Dataset authored and provided by

OpenAIhttp://openai.com/

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset Card for GSM8K

  Dataset Summary

GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.

These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.

Clear search

Close search

Google apps

Main menu

gsm8k

gsm8k-ja-test_250-1319

dart-math-pool-gsm8k

gsm8k-prolog-test

Data from: mgsm

AceMath-RewardBench

gsm8kSee More Versions

openai/gsm8k

Grade School Math 8K

gsm8k