Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OpenMathInstruct-2
OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The training set problems of GSM8K and MATH are used for constructing the dataset in the following ways:
Solution augmentation: Generating chain-of-thought solutions for training set problems in GSM8K and MATH. Problem-Solution augmentation: Generating new problems, followed by solutions for these new problems.… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2.
Facebook
TwitterRyanYr/nvidia-OpenMathInstruct-2 dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterArthur-LAGACHERIE/OpenMathInstruct-2-10k dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
nvidia/OpenMathInstruct-2のaugmented_mathだけ抜き出したものです.licenseは元データと同じです.
Facebook
Twitterzaydzuhri/OpenMathInstruct-2-Text dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterbenjamin/OpenMathInstruct-2-2M-formatted dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterimport datasets
dataset = datasets.load_from_disk('/fast/rolmedo/datasets/OpenMathInstruct-2')
problem_sources = dataset['problem_source'] keep_ids = [i for i, source in enumerate(problem_sources) if source in ['augmented_math', 'math']] dataset = dataset.select(keep_ids)
to_keep = [] problems_seen = set() for i, problem in enumerate(dataset['problem']): if problem not in problems_seen:… See the full description on the dataset page: https://huggingface.co/datasets/ricdomolm/OpenMathInstruct-2-MATH-Questions.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
OpenMathInstruct-2-QwQ
Qwen/QwQ-32B-Preview solutions of 107k augmented_math problems of nvidia/OpenMathInstruct-2. All solutions are validated and agree with the expected_answer field of OpenMathInstruct-2.
Facebook
Twittermonology/OpenMathInstruct-2-1M dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twittertlrm/OpenMathInstruct-2-filtered-shard3 dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterHachiML/OpenMathInstruct-2-CoT-JA dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterimport datasets import pandas as pd
dataset = datasets.load_dataset('nvidia/OpenMathInstruct-2', split='train_1M') dataset = dataset.filter(lambda x: x["problem_source"] == "augmented_math") dataset = dataset.remove_columns(["generated_solution", "problem_source"])
df = dataset.to_pandas() df = df.drop_duplicates(subset=["problem"]) dataset = datasets.Dataset.from_pandas(df)
dataset = dataset.rename_column("expected_answer", "answer")… See the full description on the dataset page: https://huggingface.co/datasets/ricdomolm/OpenMAthInstruct-2-AUGMATH-Deduped.
Facebook
Twittersamoline/OpenMathInstruct-2-chat dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
OpenMathInstruct-1
OpenMathInstruct-1 is a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. The problems are from GSM8K and MATH training subsets and the solutions are synthetically generated by allowing Mixtral model to use a mix of text reasoning and code blocks executed by Python interpreter. The dataset is split into train and validation subsets that we used in the ablations experiments. These two subsets… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-1.
Facebook
Twittertags: - math license: apache-2.0 --- # OpenMathInstruct2 10000 这是从 nvidia/OpenMathInstruct-2 抽取的 10000 条样本。
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Kendamarron/OpenMathInstruct-2-ja-CoT dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twittergiovannidemuri/openmathinstruct2-ex25000-seed5 dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
📐 OpenMath-Difficulty-Annotated
🚀 Overview
OpenMath-Difficulty-Annotated is a curated subset of OpenMathInstruct-2 containing 10,176 math problems, enhanced with precise difficulty metadata. While the original solutions are preserved from NVIDIA's dataset, we employed a 120B Parameter Model (LLM-as-a-Judge) to analyze and grade every single problem on a scale of 1 to 5. This allows developers of Small Language Models (1B-3B) to filter out "Olympiad-level" noise… See the full description on the dataset page: https://huggingface.co/datasets/HAD653/OpenMath-Difficulty-Annotated.
Facebook
Twitterデータセット情報
このデータセットは、Kendamarron/OpenMathInstruct-2-ja-CoTを、英語と日本語のペアを含めるよう、日本語の応答データを英語の応答データに翻訳しました。 翻訳後のデータをllm-as-a-judgeで翻訳品質を1〜5段階評価し、最高評価5のみをfilterして取得したものです。 その後、訓練データセットとテストデータセットに9:1で分割しました。
引用データセット情報
"Kendamarron/OpenMathInstruct-2-ja-CoT": 訓練データセットのみ 15282件
Dataset:
Dataset({
features: ['problem', 'generated_solution', 'expected_answer', 'problem_source', 'problem_ja', 'thought', 'output', 'evaluation', 'cot_output', 'system']… See the full description on the dataset page: https://huggingface.co/datasets/yasutoshi-lab/open-math-instruct-2-en-ja-cot.
Facebook
Twitterconcat: OpenMathInstruct-2, OpenMathReasoning, AceMath, OpenR1-Math, Numinamath-CoT, Numinamath 1.5, OpenThoughts2-1M, MetaMathQA, Maths-College
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OpenMathInstruct-2
OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The training set problems of GSM8K and MATH are used for constructing the dataset in the following ways:
Solution augmentation: Generating chain-of-thought solutions for training set problems in GSM8K and MATH. Problem-Solution augmentation: Generating new problems, followed by solutions for these new problems.… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2.