26 datasets found
  1. Infinity-Instruct

    • huggingface.co
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beijing Academy of Artificial Intelligence (2024). Infinity-Instruct [Dataset]. https://huggingface.co/datasets/BAAI/Infinity-Instruct
    Explore at:
    Dataset updated
    Jun 13, 2024
    Dataset authored and provided by
    Beijing Academy of Artificial Intelligence
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Infinity Instruct

    Beijing Academy of Artificial Intelligence (BAAI) [Paper][Code][🤗]

    The quality and scale of instruction data are crucial for model performance. Recently, open-source models have increasingly relied on fine-tuning datasets comprising millions of instances, necessitating both high quality and large scale. However, the open-source community has long been constrained by the high costs associated with building such extensive and high-quality instruction… See the full description on the dataset page: https://huggingface.co/datasets/BAAI/Infinity-Instruct.

  2. BAAI-Infinity-Instruct-System

    • huggingface.co
    Updated Jun 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arcee AI (2024). BAAI-Infinity-Instruct-System [Dataset]. https://huggingface.co/datasets/arcee-ai/BAAI-Infinity-Instruct-System
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 21, 2024
    Dataset provided by
    Arcee AI, Inc.
    Authors
    Arcee AI
    Description

    Arcee.ai Modifications

    The original dataset (https://huggingface.co/datasets/BAAI/Infinity-Instruct) contained 383,697 samples that used "gpt" tags for system instructions instead of "system" tags. Additionally, 56 samples had empty values for either the human or gpt fields. We have addressed these issues by renaming the tags in the affected samples and removing those with empty values. The remainder of the dataset is unchanged.

      Infinity Instruct
    

    Beijing Academy… See the full description on the dataset page: https://huggingface.co/datasets/arcee-ai/BAAI-Infinity-Instruct-System.

  3. h

    MindSpeed-Infinity-Instruct-7M

    • huggingface.co
    Updated Jan 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiangwen Su (2025). MindSpeed-Infinity-Instruct-7M [Dataset]. https://huggingface.co/datasets/uukuguy/MindSpeed-Infinity-Instruct-7M
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 6, 2025
    Authors
    Jiangwen Su
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset is built appond the Infinity Instruct project, aiming to match the multi-rounds dialogue finetune format of the MindSpeed-LLM.

      Infinity Instruct
    

    Beijing Academy of Artificial Intelligence (BAAI) [Paper][Code]🤗

    The quality and scale of instruction data are crucial for model performance. Recently, open-source models have increasingly relied on fine-tuning datasets comprising millions of instances, necessitating both high quality and large… See the full description on the dataset page: https://huggingface.co/datasets/uukuguy/MindSpeed-Infinity-Instruct-7M.

  4. h

    Infinity-Instruct-7M-en-old

    • huggingface.co
    Updated Feb 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Woojeong Kim (2025). Infinity-Instruct-7M-en-old [Dataset]. https://huggingface.co/datasets/friendshipkim/Infinity-Instruct-7M-en-old
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 20, 2025
    Authors
    Woojeong Kim
    Description

    friendshipkim/Infinity-Instruct-7M-en-old dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    BAAI-Infinity-Instruct-7M-core-en

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    s, BAAI-Infinity-Instruct-7M-core-en [Dataset]. https://huggingface.co/datasets/semran1/BAAI-Infinity-Instruct-7M-core-en
    Explore at:
    Authors
    s
    Description

    semran1/BAAI-Infinity-Instruct-7M-core-en dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    Infinity-Instruct

    • huggingface.co
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Zuo (2025). Infinity-Instruct [Dataset]. https://huggingface.co/datasets/zuom/Infinity-Instruct
    Explore at:
    Dataset updated
    Apr 28, 2025
    Authors
    Max Zuo
    Description

    Built by stripping BAAI/Infinity-Instruct and reformatting.

  7. h

    infinity-instruct-7M

    • huggingface.co
    Updated Aug 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sgp-bench (2024). infinity-instruct-7M [Dataset]. https://huggingface.co/datasets/sgp-bench/infinity-instruct-7M
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2024
    Dataset authored and provided by
    sgp-bench
    Description

    sgp-bench/infinity-instruct-7M dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    infinity-instruct-100k

    • huggingface.co
    Updated Jun 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulkader Abdulrazzak (2024). infinity-instruct-100k [Dataset]. https://huggingface.co/datasets/qdr91/infinity-instruct-100k
    Explore at:
    Dataset updated
    Jun 19, 2024
    Authors
    Abdulkader Abdulrazzak
    Description

    qdr91/infinity-instruct-100k dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    infinity-instruct-3M

    • huggingface.co
    Updated Aug 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sgp-bench (2024). infinity-instruct-3M [Dataset]. https://huggingface.co/datasets/sgp-bench/infinity-instruct-3M
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 16, 2024
    Dataset authored and provided by
    sgp-bench
    Description

    sgp-bench/infinity-instruct-3M dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    infinity-instruct-inverse

    • huggingface.co
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ExtraOrdinaryLab (2024). infinity-instruct-inverse [Dataset]. https://huggingface.co/datasets/extraordinarylab/infinity-instruct-inverse
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 26, 2024
    Dataset authored and provided by
    ExtraOrdinaryLab
    Description

    extraordinarylab/infinity-instruct-inverse dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    Infinity-Instruct-3M

    • huggingface.co
    Updated Jan 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Woojeong Kim (2025). Infinity-Instruct-3M [Dataset]. https://huggingface.co/datasets/friendshipkim/Infinity-Instruct-3M
    Explore at:
    Dataset updated
    Jan 22, 2025
    Authors
    Woojeong Kim
    Description

    friendshipkim/Infinity-Instruct-3M dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    BAAI_Infinity-Instruct-7M-Gen-Llama3_1-70B-details

    • huggingface.co
    Updated Jul 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open LLM Leaderboard (2025). BAAI_Infinity-Instruct-7M-Gen-Llama3_1-70B-details [Dataset]. https://huggingface.co/datasets/open-llm-leaderboard/BAAI_Infinity-Instruct-7M-Gen-Llama3_1-70B-details
    Explore at:
    Dataset updated
    Jul 30, 2025
    Dataset authored and provided by
    Open LLM Leaderboard
    Description

    Dataset Card for Evaluation run of BAAI/Infinity-Instruct-7M-Gen-Llama3_1-70B

    Dataset automatically created during the evaluation run of model BAAI/Infinity-Instruct-7M-Gen-Llama3_1-70B The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/BAAI_Infinity-Instruct-7M-Gen-Llama3_1-70B-details.

  13. h

    BAAI_Infinity-Instruct-7M-Gen-mistral-7B-details

    • huggingface.co
    Updated Jul 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open LLM Leaderboard (2025). BAAI_Infinity-Instruct-7M-Gen-mistral-7B-details [Dataset]. https://huggingface.co/datasets/open-llm-leaderboard/BAAI_Infinity-Instruct-7M-Gen-mistral-7B-details
    Explore at:
    Dataset updated
    Jul 30, 2025
    Dataset authored and provided by
    Open LLM Leaderboard
    Description

    Dataset Card for Evaluation run of BAAI/Infinity-Instruct-7M-Gen-mistral-7B

    Dataset automatically created during the evaluation run of model BAAI/Infinity-Instruct-7M-Gen-mistral-7B The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/BAAI_Infinity-Instruct-7M-Gen-mistral-7B-details.

  14. h

    Infinity-Instruct-Reformatted

    • huggingface.co
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shih-Kai Hsiao (2024). Infinity-Instruct-Reformatted [Dataset]. https://huggingface.co/datasets/ShinoharaHare/Infinity-Instruct-Reformatted
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2024
    Authors
    Shih-Kai Hsiao
    Description

    ShinoharaHare/Infinity-Instruct-Reformatted dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    BAAI_Infinity-Instruct-3M-0625-Yi-1.5-9B-details

    • huggingface.co
    Updated Jul 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open LLM Leaderboard (2025). BAAI_Infinity-Instruct-3M-0625-Yi-1.5-9B-details [Dataset]. https://huggingface.co/datasets/open-llm-leaderboard/BAAI_Infinity-Instruct-3M-0625-Yi-1.5-9B-details
    Explore at:
    Dataset updated
    Jul 30, 2025
    Dataset authored and provided by
    Open LLM Leaderboard
    Description

    Dataset Card for Evaluation run of BAAI/Infinity-Instruct-3M-0625-Yi-1.5-9B

    Dataset automatically created during the evaluation run of model BAAI/Infinity-Instruct-3M-0625-Yi-1.5-9B The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/BAAI_Infinity-Instruct-3M-0625-Yi-1.5-9B-details.

  16. h

    Infinity-Instruct-0625-Converted

    • huggingface.co
    Updated Jul 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Habibullah Akbar (2024). Infinity-Instruct-0625-Converted [Dataset]. https://huggingface.co/datasets/ChavyvAkvar/Infinity-Instruct-0625-Converted
    Explore at:
    Dataset updated
    Jul 9, 2024
    Authors
    Habibullah Akbar
    Description

    ChavyvAkvar/Infinity-Instruct-0625-Converted dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    BAAI_Infinity-Instruct-3M-0625-Qwen2-7B-details

    • huggingface.co
    Updated Jul 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open LLM Leaderboard (2025). BAAI_Infinity-Instruct-3M-0625-Qwen2-7B-details [Dataset]. https://huggingface.co/datasets/open-llm-leaderboard/BAAI_Infinity-Instruct-3M-0625-Qwen2-7B-details
    Explore at:
    Dataset updated
    Jul 30, 2025
    Dataset authored and provided by
    Open LLM Leaderboard
    Description

    Dataset Card for Evaluation run of BAAI/Infinity-Instruct-3M-0625-Qwen2-7B

    Dataset automatically created during the evaluation run of model BAAI/Infinity-Instruct-3M-0625-Qwen2-7B The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/BAAI_Infinity-Instruct-3M-0625-Qwen2-7B-details.

  18. h

    BAAI_Infinity-Instruct-3M-0625-Llama3-8B-details

    • huggingface.co
    Updated Jul 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open LLM Leaderboard (2025). BAAI_Infinity-Instruct-3M-0625-Llama3-8B-details [Dataset]. https://huggingface.co/datasets/open-llm-leaderboard/BAAI_Infinity-Instruct-3M-0625-Llama3-8B-details
    Explore at:
    Dataset updated
    Jul 30, 2025
    Dataset authored and provided by
    Open LLM Leaderboard
    Description

    Dataset Card for Evaluation run of BAAI/Infinity-Instruct-3M-0625-Llama3-8B

    Dataset automatically created during the evaluation run of model BAAI/Infinity-Instruct-3M-0625-Llama3-8B The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/BAAI_Infinity-Instruct-3M-0625-Llama3-8B-details.

  19. h

    Infinity-Instruct-0625-Qwen

    • huggingface.co
    Updated Jul 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soumil Gupta (2024). Infinity-Instruct-0625-Qwen [Dataset]. https://huggingface.co/datasets/Soumil30/Infinity-Instruct-0625-Qwen
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 9, 2024
    Authors
    Soumil Gupta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Soumil30/Infinity-Instruct-0625-Qwen dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    jlzhou_Qwen2.5-3B-Infinity-Instruct-0625-details

    • huggingface.co
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open LLM Leaderboard (2025). jlzhou_Qwen2.5-3B-Infinity-Instruct-0625-details [Dataset]. https://huggingface.co/datasets/open-llm-leaderboard/jlzhou_Qwen2.5-3B-Infinity-Instruct-0625-details
    Explore at:
    Dataset updated
    Jul 30, 2025
    Dataset authored and provided by
    Open LLM Leaderboard
    Description

    Dataset Card for Evaluation run of jlzhou/Qwen2.5-3B-Infinity-Instruct-0625

    Dataset automatically created during the evaluation run of model jlzhou/Qwen2.5-3B-Infinity-Instruct-0625 The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/jlzhou_Qwen2.5-3B-Infinity-Instruct-0625-details.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Beijing Academy of Artificial Intelligence (2024). Infinity-Instruct [Dataset]. https://huggingface.co/datasets/BAAI/Infinity-Instruct
Organization logo

Infinity-Instruct

BAAI/Infinity-Instruct

Explore at:
30 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 13, 2024
Dataset authored and provided by
Beijing Academy of Artificial Intelligence
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Infinity Instruct

Beijing Academy of Artificial Intelligence (BAAI) [Paper][Code][🤗]

The quality and scale of instruction data are crucial for model performance. Recently, open-source models have increasingly relied on fine-tuning datasets comprising millions of instances, necessitating both high quality and large scale. However, the open-source community has long been constrained by the high costs associated with building such extensive and high-quality instruction… See the full description on the dataset page: https://huggingface.co/datasets/BAAI/Infinity-Instruct.

Search
Clear search
Close search
Google apps
Main menu