Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Infinity Instruct
Beijing Academy of Artificial Intelligence (BAAI) [Paper][Code][🤗]
The quality and scale of instruction data are crucial for model performance. Recently, open-source models have increasingly relied on fine-tuning datasets comprising millions of instances, necessitating both high quality and large scale. However, the open-source community has long been constrained by the high costs associated with building such extensive and high-quality instruction… See the full description on the dataset page: https://huggingface.co/datasets/BAAI/Infinity-Instruct.
Arcee.ai Modifications
The original dataset (https://huggingface.co/datasets/BAAI/Infinity-Instruct) contained 383,697 samples that used "gpt" tags for system instructions instead of "system" tags. Additionally, 56 samples had empty values for either the human or gpt fields. We have addressed these issues by renaming the tags in the affected samples and removing those with empty values. The remainder of the dataset is unchanged.
Infinity Instruct
Beijing Academy… See the full description on the dataset page: https://huggingface.co/datasets/arcee-ai/BAAI-Infinity-Instruct-System.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset is built appond the Infinity Instruct project, aiming to match the multi-rounds dialogue finetune format of the MindSpeed-LLM.
Infinity Instruct
Beijing Academy of Artificial Intelligence (BAAI) [Paper][Code]🤗
The quality and scale of instruction data are crucial for model performance. Recently, open-source models have increasingly relied on fine-tuning datasets comprising millions of instances, necessitating both high quality and large… See the full description on the dataset page: https://huggingface.co/datasets/uukuguy/MindSpeed-Infinity-Instruct-7M.
friendshipkim/Infinity-Instruct-7M-en-old dataset hosted on Hugging Face and contributed by the HF Datasets community
semran1/BAAI-Infinity-Instruct-7M-core-en dataset hosted on Hugging Face and contributed by the HF Datasets community
sgp-bench/infinity-instruct-7M dataset hosted on Hugging Face and contributed by the HF Datasets community
qdr91/infinity-instruct-100k dataset hosted on Hugging Face and contributed by the HF Datasets community
sgp-bench/infinity-instruct-3M dataset hosted on Hugging Face and contributed by the HF Datasets community
extraordinarylab/infinity-instruct-inverse dataset hosted on Hugging Face and contributed by the HF Datasets community
friendshipkim/Infinity-Instruct-3M dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for Evaluation run of BAAI/Infinity-Instruct-7M-Gen-Llama3_1-70B
Dataset automatically created during the evaluation run of model BAAI/Infinity-Instruct-7M-Gen-Llama3_1-70B The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/BAAI_Infinity-Instruct-7M-Gen-Llama3_1-70B-details.
Dataset Card for Evaluation run of BAAI/Infinity-Instruct-7M-Gen-mistral-7B
Dataset automatically created during the evaluation run of model BAAI/Infinity-Instruct-7M-Gen-mistral-7B The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/BAAI_Infinity-Instruct-7M-Gen-mistral-7B-details.
ShinoharaHare/Infinity-Instruct-Reformatted dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for Evaluation run of BAAI/Infinity-Instruct-3M-0625-Yi-1.5-9B
Dataset automatically created during the evaluation run of model BAAI/Infinity-Instruct-3M-0625-Yi-1.5-9B The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/BAAI_Infinity-Instruct-3M-0625-Yi-1.5-9B-details.
ChavyvAkvar/Infinity-Instruct-0625-Converted dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for Evaluation run of BAAI/Infinity-Instruct-3M-0625-Qwen2-7B
Dataset automatically created during the evaluation run of model BAAI/Infinity-Instruct-3M-0625-Qwen2-7B The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/BAAI_Infinity-Instruct-3M-0625-Qwen2-7B-details.
Dataset Card for Evaluation run of BAAI/Infinity-Instruct-3M-0625-Llama3-8B
Dataset automatically created during the evaluation run of model BAAI/Infinity-Instruct-3M-0625-Llama3-8B The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/BAAI_Infinity-Instruct-3M-0625-Llama3-8B-details.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Soumil30/Infinity-Instruct-0625-Qwen dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for Evaluation run of jlzhou/Qwen2.5-3B-Infinity-Instruct-0625
Dataset automatically created during the evaluation run of model jlzhou/Qwen2.5-3B-Infinity-Instruct-0625 The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/jlzhou_Qwen2.5-3B-Infinity-Instruct-0625-details.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Infinity Instruct
Beijing Academy of Artificial Intelligence (BAAI) [Paper][Code][🤗]
The quality and scale of instruction data are crucial for model performance. Recently, open-source models have increasingly relied on fine-tuning datasets comprising millions of instances, necessitating both high quality and large scale. However, the open-source community has long been constrained by the high costs associated with building such extensive and high-quality instruction… See the full description on the dataset page: https://huggingface.co/datasets/BAAI/Infinity-Instruct.