MATH-500 Multilingual Problem Set 🌍➗
A multilingual subset from OpenAI's MATH benchmark. Perfect for testing math skills across languages, this dataset includes same problems in English, French, Italian, Turkish and Spanish.
🌐 Available Languages
English 🇬🇧
French 🇫🇷
Italian 🇮🇹
Turkish 🇹🇷
Spanish 🇪🇸
📂 Source & Attribution
Original Dataset: Sourced from HuggingFaceH4/MATH-500.
🚀 Quick Start
Load the dataset… See the full description on the dataset page: https://huggingface.co/datasets/bezir/MATH-500-multilingual.
alperengozeten/MATH-500-SUMMARY dataset hosted on Hugging Face and contributed by the HF Datasets community
MATH is a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
website | paper
AceMath-RewardBench Evaluation Dataset Card
The AceMath-RewardBench evaluation dataset evaluates capabilities of a math reward model using the best-of-N (N=8) setting for 7 datasets:
GSM8K: 1319 questions Math500: 500 questions Minerva Math: 272 questions Gaokao 2023 en: 385 questions OlympiadBench: 675 questions College Math: 2818 questions MMLU STEM: 3018 questions
Each example in the dataset contains:
A mathematical question 64 solution attempts with varying… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/AceMath-RewardBench.
In 2024, the artificial analysis math index ranked AI models based on their mathematical reasoning using benchmarks like AIME 2024 and Math-500. o1, QwQ-32B, and DeepSeek R1, led the rankings, showing the highest proficiency in mathematical problem solving.
Comparison of by Model
import numpy as np import torch from tqdm import tqdm from datasets import load_dataset, DatasetDict, Dataset import datasets
def get_top_n_docs(scores, n): """Return top-n document indices for a query, ignoring negative scores.""" valid_docs = np.where(scores >= 0)[0] # Filter out negative scores sorted_indices = np.argsort(-scores[valid_docs]) # Descending order top_n_indices = valid_docs[sorted_indices][:n] # Take top n return set(top_n_indices)
def… See the full description on the dataset page: https://huggingface.co/datasets/pxyyy/NuminaMath-CoT-smp20k-removed-top500-by-logix-for-MATH-Correct-2k.
Карточка датасета MATH-500-Russian
Перевод датасета HuggingFaceH4/MATH-500 на русский язык, был выполнен моделью qwen2.5:32b через скрипты EvilFreelancer/datasets-translator. Данный набор данных содержит подмножество из 500 задач из теста MATH, который OpenAI создал для статьи Let's Verify Step by Step и переведённых на русский язык. Подробности в их репозиторий на GitHub.
Comparison of Represents the average of math benchmarks in the Artificial Analysis Intelligence Index (AIME 2024 & Math-500) by Model
alucchi/MATH-500_n100_e200_oadam1e-05_b6_8_a0.01_MATH-500_s1 dataset hosted on Hugging Face and contributed by the HF Datasets community
violetxi/MATH-500_v0llama_star_iter4 dataset hosted on Hugging Face and contributed by the HF Datasets community
Comparison of Represents the average of math benchmarks in the Artificial Analysis Intelligence Index (AIME 2024 & Math-500) by Model
violetxi/MATH-500_Llama3b_GRPO dataset hosted on Hugging Face and contributed by the HF Datasets community
Comparison of Represents the average of math benchmarks in the Artificial Analysis Intelligence Index (AIME 2024 & Math-500) by Model
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Historical price and volatility data for US Dollar in Math-e-MATIC across different time periods.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Traditional neural networks used gradient descent methods to train the network structure, which cannot handle complex optimization problems. We proposed an improved grey wolf optimizer (SGWO) to explore a better network structure. GWO was improved by using circle population initialization, information interaction mechanism and adaptive position update to enhance the search performance of the algorithm. SGWO was applied to optimize Elman network structure, and a new prediction method (SGWO-Elman) was proposed. The convergence of SGWO was analyzed by mathematical theory, and the optimization ability of SGWO and the prediction performance of SGWO-Elman were examined using comparative experiments. The results show: (1) the global convergence probability of SGWO was 1, and its process was a finite homogeneous Markov chain with an absorption state; (2) SGWO not only has better optimization performance when solving complex functions of different dimensions, but also when applied to Elman for parameter optimization, SGWO can significantly optimize the network structure and SGWO-Elman has accurate prediction performance.
Comparison of Represents the average of math benchmarks in the Artificial Analysis Intelligence Index (AIME 2024 & Math-500) by Model
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Historical price and volatility data for Euro in Math-e-MATIC across different time periods.
Comparison of Represents the average of math benchmarks in the Artificial Analysis Intelligence Index (AIME 2024 & Math-500) by Model
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Historical price and volatility data for Russian Rubles in Math-e-MATIC across different time periods.
MATH-500 Multilingual Problem Set 🌍➗
A multilingual subset from OpenAI's MATH benchmark. Perfect for testing math skills across languages, this dataset includes same problems in English, French, Italian, Turkish and Spanish.
🌐 Available Languages
English 🇬🇧
French 🇫🇷
Italian 🇮🇹
Turkish 🇹🇷
Spanish 🇪🇸
📂 Source & Attribution
Original Dataset: Sourced from HuggingFaceH4/MATH-500.
🚀 Quick Start
Load the dataset… See the full description on the dataset page: https://huggingface.co/datasets/bezir/MATH-500-multilingual.