A challenge set for elementary-level Math Word Problems (MWP). An MWP consists of a short Natural Language narrative that describes a state of the world and poses a question about some unknown quantities.
The examples in SVAMP test a model across different aspects of solving MWPs: 1) Is the model question sensitive? 2) Does the model have robust reasoning ability? 3) Is it invariant to structural alterations?
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Calc-SVAMP
Summary
The dataset is a collection of simple math word problems focused on arithmetics. It is derived from https://github.com/arkilpatel/SVAMP/. The main addition in this dataset variant is the chain column. It was created by converting the solution to a simple html-like language that can be easily parsed (e.g. by BeautifulSoup). The data contains 3 types of tags:
gadget: A tag whose content is intended to be evaluated by calling an external… See the full description on the dataset page: https://huggingface.co/datasets/MU-NLPC/Calc-svamp.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The provided files contain outputs generated by various Large Language Models (LLMs) for solving problems in the SVAMP dataset. Additionally, they include tagged statements of problems that LLMs incorrectly resolved.
This repository includes the following two files:
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
X-SVAMP
🤗 Paper | 📖 arXiv
Dataset Description
X-SVAMP is an evaluation benchmark for multilingual large language models (LLMs), including questions and answers in 5 languages (English, Chinese, Korean, Italian and Spanish). It is intended to evaluate the math reasoning abilities of LLMs. The dataset is translated by GPT-4-turbo from the original English-version SVAMP. In our paper, we evaluate LLMs in a zero-shot generative setting: prompt the instruction-tuned LLM with… See the full description on the dataset page: https://huggingface.co/datasets/zhihz0535/X-SVAMP_en_zh_ko_it_es.
Merule är en svamp som angriper skogarna i byggnaderna, särskilt ramar och snickeri av fuktiga och dåligt ventilerade byggnader. Genom lagen av den 24 mars 2014 om tillgång till bostäder och en renoverad stadsplanering, känd som ALUR-lagen, inrättas en mekanism för att bekämpa utvecklingen av meriter i hemmet. Dessa bestämmelser ska införas i artiklarna L 133.7–L 133.9 i lagen om byggande och boende. Denna enhet är baserad på: — skyldighet att deklarera i stadshusbyggnader som angripits av fonden, — avgränsningen på departementsnivå av områden där det finns risk för meriter, — skyldigheter vid försäljning, inom områden som avgränsas av prefekturdekret.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
A challenge set for elementary-level Math Word Problems (MWP). An MWP consists of a short Natural Language narrative that describes a state of the world and poses a question about some unknown quantities.
The examples in SVAMP test a model across different aspects of solving MWPs: 1) Is the model question sensitive? 2) Does the model have robust reasoning ability? 3) Is it invariant to structural alterations?