https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
OpenMathInstruct-1
OpenMathInstruct-1 is a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. The problems are from GSM8K and MATH training subsets and the solutions are synthetically generated by allowing Mixtral model to use a mix of text reasoning and code blocks executed by Python interpreter. The dataset is split into train and validation subsets that we used in the ablations experiments. These two subsets… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-1.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
OpenMathInstruct-1 を日本語に自動翻訳した商用利用可能な180万件の指示チューニングデータセットになります。
OpenMathInstruct-1 は、GSM8K および MATH ベンチマーク トレーニングセットの question と Mixtral-8x7B モデルを使用して生成された solution のペアで構成される数学分野のデータセットです。solution は合成データですが GSM8K および MATH の解答と solution から導出した値が等しくなることを確認することで誤った solution を除外しています。データセットの詳細は論文をご覧下さい。
このデータセットの使用は、商用利用を許可するNVIDIA ライセンスによって管理されます。本データセットを再配布する場合は当該ライセンスを継承する必要があります。また、モデル学習に使用した場合は、モデルのライセンスは当該ライセンスに従う必要はないという認識です( Nvidia の OpenMath-Mistral や OpenMath-CodeLlama も apache 2.0… See the full description on the dataset page: https://huggingface.co/datasets/kunishou/OpenMathInstruct-1-1.8m-ja.
5CD-AI/Vietnamese-nvidia-OpenMathInstruct-1-50k-gg-translated dataset hosted on Hugging Face and contributed by the HF Datasets community
This is a parsed subset of the OpenMathInstruct-1 dataset. Each sample in the dataset is formatted in the following way: {"messages": [ {"role": "user", "content": "user prompt"}, {"role": "assistant", "content": "assistant response", "tool_call": "python code"}, {"role": "tool", "content": "code execution result"}, {"role": "assistant","content": "assistant response"}, ]}
The dataset is split into two files:
open-math-short.jsonl - the assistant only makes a single function call.… See the full description on the dataset page: https://huggingface.co/datasets/kaleinaNyan/OpenMathInstruct-1-Parsed.
vwxyzjn/OpenMathInstruct-1-correct dataset hosted on Hugging Face and contributed by the HF Datasets community
samoline/OpenMathInstruct-1 dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OpenMathInstruct-2
OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The training set problems of GSM8K and MATH are used for constructing the dataset in the following ways:
Solution augmentation: Generating chain-of-thought solutions for training set problems in GSM8K and MATH. Problem-Solution augmentation: Generating new problems, followed by solutions for these new problems.… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-2.
marcuscedricridia/OpenMathInstruct-1-1000-processed dataset hosted on Hugging Face and contributed by the HF Datasets community
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
OpenMathInstruct-1-1.8m-ja-askllm-v1
データセット kunishou/OpenMathInstruct-1-1.8m-ja に対して、 Ask-LLM 手法でスコア付けしたデータセットです。 元データセットのカラムに加え askllm_score というカラムが追加されており、ここに Ask-LLM のスコアが格納されています。 Ask-LLM でスコア付けに使用した LLM は Rakuten/RakutenAI-7B-instruct で、プロンプトは以下の通りです。 ### {data} ###
Does the previous paragraph demarcated within ### and ### contain informative signal for pre-training a large-language model? An informative datapoint should be well-formatted, contain some usable knowledge of the… See the full description on the dataset page: https://huggingface.co/datasets/geniacllm/OpenMathInstruct-1-1.8m-ja-askllm-v1.
gabrielmbmb/OpenMathInstruct-2-trl dataset hosted on Hugging Face and contributed by the HF Datasets community
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
OpenMath GSM8K Masked
We release a masked version of the MATH solutions. This data can be used to aid synthetic generation of additional solutions for MATH dataset as it is much less likely to lead to inconsistent reasoning compared to using the original solutions directly. This dataset was used to construct OpenMathInstruct-1: a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. For details of how the masked… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMath-MATH-masked.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
OpenMath GSM8K Masked
We release a masked version of the GSM8K solutions. This data can be used to aid synthetic generation of additional solutions for GSM8K dataset as it is much less likely to lead to inconsistent reasoning compared to using the original solutions directly. This dataset was used to construct OpenMathInstruct-1: a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. For details of how the masked… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMath-GSM8K-masked.
GENIAC-Team-Ozaki/OpenMathInstruct-1-1.8m-ja_10k dataset hosted on Hugging Face and contributed by the HF Datasets community
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
kunishou/OpenMathInstruct-1-1.8m-jaをPhi-3で解き直したものです。 実際にPython scriptを実行し、答えがあっていたもののみをフィルタリングして掲載しています。
生成コード
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
OpenMathInstruct-1
OpenMathInstruct-1 is a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model. The problems are from GSM8K and MATH training subsets and the solutions are synthetically generated by allowing Mixtral model to use a mix of text reasoning and code blocks executed by Python interpreter. The dataset is split into train and validation subsets that we used in the ablations experiments. These two subsets… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/OpenMathInstruct-1.