17 datasets found
  1. h

    apigen-function-calling

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla, apigen-function-calling [Dataset]. https://huggingface.co/datasets/argilla/apigen-function-calling
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Argilla
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset card for argilla/apigen-function-calling

    This dataset is a merge of argilla/Synth-APIGen-v0.1 and Salesforce/xlam-function-calling-60k, making over 100K function calling examples following the APIGen recipe.

      Prepare for training
    

    This version is not ready to do fine tuning, but you can run a script like prepare_for_sft.py to prepare it, and run the same recipe that can be found in argilla/Llama-3.2-1B-Instruct-APIGen-FC-v0.1#training-procedure. Modify the prompt… See the full description on the dataset page: https://huggingface.co/datasets/argilla/apigen-function-calling.

  2. APIGen-MT-5k

    • huggingface.co
    Updated May 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salesforce (2025). APIGen-MT-5k [Dataset]. https://huggingface.co/datasets/Salesforce/APIGen-MT-5k
    Explore at:
    Dataset updated
    May 5, 2025
    Dataset provided by
    Salesforce Inchttp://salesforce.com/
    Authors
    Salesforce
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Summary

    APIGen-MT is an automated agentic data generation pipeline designed to synthesize verifiable, high-quality, realistic datasets for agentic applications This dataset was released as part of APIGen-MT: Agentic PIpeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay The repo contains 5000 multi-turn trajectories collected by APIGen-MT This dataset is a subset of the data used to train the xLAM-2 model series

      Overview
    

    Agentic data consists of… See the full description on the dataset page: https://huggingface.co/datasets/Salesforce/APIGen-MT-5k.

  3. h

    Synth-APIGen-v0.1

    • huggingface.co
    Updated Apr 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla (2023). Synth-APIGen-v0.1 [Dataset]. https://huggingface.co/datasets/argilla/Synth-APIGen-v0.1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 15, 2023
    Dataset authored and provided by
    Argilla
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset card for Synth-APIGen-v0.1

    This dataset has been created with distilabel. Pipeline script: pipeline_apigen_train.py.

      Dataset creation
    

    It has been created with distilabel==1.4.0 version. This dataset is an implementation of APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets in distilabel, generated from synthetic functions. The process can be summarized as follows:

    Generate (or in this case modify) python… See the full description on the dataset page: https://huggingface.co/datasets/argilla/Synth-APIGen-v0.1.

  4. h

    synth-apigen-llama

    • huggingface.co
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla Warehouse (2024). synth-apigen-llama [Dataset]. https://huggingface.co/datasets/argilla-warehouse/synth-apigen-llama
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 10, 2024
    Dataset authored and provided by
    Argilla Warehouse
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for argilla-warehouse/synth-apigen-llama

    This dataset has been created with distilabel. The pipeline script was uploaded to easily reproduce the dataset: synth_apigen.py.

      Dataset creation
    

    This dataset is a replica in distilabel of the framework defined in: APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets. Using the seed dataset of synthetic python functions in argilla-warehouse/python-seed-tools, the… See the full description on the dataset page: https://huggingface.co/datasets/argilla-warehouse/synth-apigen-llama.

  5. h

    synth-apigen-qwen

    • huggingface.co
    Updated Oct 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla Warehouse (2024). synth-apigen-qwen [Dataset]. https://huggingface.co/datasets/argilla-warehouse/synth-apigen-qwen
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 11, 2024
    Dataset authored and provided by
    Argilla Warehouse
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for argilla-warehouse/synth-apigen-qwen

    This dataset has been created with distilabel. The pipeline script was uploaded to easily reproduce the dataset: synth_apigen.py.

      Dataset creation
    

    This dataset is a replica in distilabel of the framework defined in: APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets. Using the seed dataset of synthetic python functions in argilla-warehouse/python-seed-tools, the… See the full description on the dataset page: https://huggingface.co/datasets/argilla-warehouse/synth-apigen-qwen.

  6. h

    apigen-synth-trl

    • huggingface.co
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    apigen-synth-trl [Dataset]. https://huggingface.co/datasets/argilla-warehouse/apigen-synth-trl
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 10, 2024
    Dataset authored and provided by
    Argilla Warehouse
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset card

    This dataset is a version of argilla/Synth-APIGen-v0.1 prepared for fine-tuning using trl. To generate it, the following script was run: from datasets import load_dataset from jinja2 import Template

    SYSTEM_PROMPT = """ You are an expert in composing functions. You are given a question and a set of possible functions. Based on the question, you will need to make one or more function/tool calls to achieve the purpose. If none of the functions can be used, point it out… See the full description on the dataset page: https://huggingface.co/datasets/argilla-warehouse/apigen-synth-trl.

  7. h

    apigen-mt-5k-parsed

    • huggingface.co
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    minpeter (2025). apigen-mt-5k-parsed [Dataset]. https://huggingface.co/datasets/minpeter/apigen-mt-5k-parsed
    Explore at:
    Dataset updated
    May 30, 2025
    Authors
    minpeter
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    [PARSED] APIGen-MT-5k

    The data in this dataset is a full of the original Salesforce/APIGen-MT-5k

    Subset name multi-turn parallel multiple definition Last turn type number of dataset

    apigen-mt-5k yes no yes complex 5k

    This is a re-parsing formatting dataset for the APIGen-MT-5k official dataset.

      Load the dataset
    

    from datasets import load_dataset

    ds = load_dataset("minpeter/apigen-mt-5k-parsed") print(ds)

    DatasetDict({

    train: Dataset({

    … See the full description on the dataset page: https://huggingface.co/datasets/minpeter/apigen-mt-5k-parsed.

  8. h

    APIGen-MT-5k-sharegpt

    • huggingface.co
    Updated May 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    APIGen-MT-5k-sharegpt [Dataset]. https://huggingface.co/datasets/Beryex/APIGen-MT-5k-sharegpt
    Explore at:
    Dataset updated
    May 15, 2024
    Authors
    Boyao Wang
    Description

    Dataset Card for APIGen-MT-5k-sharegpt

    This dataset is the sharegpt format of the original Salesforce/APIGen-MT-5k dataset. It is primarily designed for fine-tuning large language models (LLMs) for function calling and multi-turn conversations.

      Dataset Description
    

    The original Salesforce/APIGen-MT-5k dataset contains conversations between users and a language model, focusing on API usage and tool invocation scenarios. We have converted this dataset into the ShareGPT… See the full description on the dataset page: https://huggingface.co/datasets/Beryex/APIGen-MT-5k-sharegpt.

  9. h

    APIGen-MT-46k

    • huggingface.co
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    minyi (2025). APIGen-MT-46k [Dataset]. https://huggingface.co/datasets/minyichen/APIGen-MT-46k
    Explore at:
    Dataset updated
    Jun 24, 2025
    Authors
    minyi
    Description

    minyichen/APIGen-MT-46k dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    Dans-Toolmaxx-Functions-apigen-subset

    • huggingface.co
    Updated Jan 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PocketDoc (2025). Dans-Toolmaxx-Functions-apigen-subset [Dataset]. https://huggingface.co/datasets/PocketDoc/Dans-Toolmaxx-Functions-apigen-subset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 9, 2025
    Authors
    PocketDoc
    Description

    PocketDoc/Dans-Toolmaxx-Functions-apigen-subset dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    APIGen-MT-5k-with-cot

    • huggingface.co
    Updated Jul 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shubhashis Roy Dipta (2025). APIGen-MT-5k-with-cot [Dataset]. https://huggingface.co/datasets/dipta007/APIGen-MT-5k-with-cot
    Explore at:
    Dataset updated
    Jul 12, 2025
    Authors
    Shubhashis Roy Dipta
    Description

    dipta007/APIGen-MT-5k-with-cot dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    APIGen-MT-5k-with-cot-v1-deepseek_deepseek

    • huggingface.co
    Updated Jul 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shubhashis Roy Dipta (2025). APIGen-MT-5k-with-cot-v1-deepseek_deepseek [Dataset]. https://huggingface.co/datasets/dipta007/APIGen-MT-5k-with-cot-v1-deepseek_deepseek
    Explore at:
    Dataset updated
    Jul 12, 2025
    Authors
    Shubhashis Roy Dipta
    Description

    dipta007/APIGen-MT-5k-with-cot-v1-deepseek_deepseek dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. h

    APIGen-MT-5k-modified

    • huggingface.co
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    qbx (2025). APIGen-MT-5k-modified [Dataset]. https://huggingface.co/datasets/qiukingballball/APIGen-MT-5k-modified
    Explore at:
    Dataset updated
    May 16, 2025
    Authors
    qbx
    Description

    qiukingballball/APIGen-MT-5k-modified dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    APIGen-MT-5k-with-think

    • huggingface.co
    Updated May 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shubhashis Roy Dipta (2024). APIGen-MT-5k-with-think [Dataset]. https://huggingface.co/datasets/dipta007/APIGen-MT-5k-with-think
    Explore at:
    Dataset updated
    May 15, 2024
    Authors
    Shubhashis Roy Dipta
    Description

    dipta007/APIGen-MT-5k-with-think dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    apigen-mt-5k-friendli

    • huggingface.co
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    minpeter (2025). apigen-mt-5k-friendli [Dataset]. https://huggingface.co/datasets/minpeter/apigen-mt-5k-friendli
    Explore at:
    Dataset updated
    May 19, 2025
    Authors
    minpeter
    Description

    Used in axolotl

    datasets: - path: minpeter/apigen-mt-5k-friendli data_files: - train.jsonl - test.jsonl type: chat_template roles_to_train: ["assistant"] field_messages: messages message_property_mappings: role: role content: contentchat_template: tokenizer_default

  16. h

    xlam-function-calling-60k-parsed

    • huggingface.co
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    minpeter (2025). xlam-function-calling-60k-parsed [Dataset]. https://huggingface.co/datasets/minpeter/xlam-function-calling-60k-parsed
    Explore at:
    Dataset updated
    May 30, 2025
    Authors
    minpeter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    [PARSED] APIGen Function-Calling Datasets (xLAM)

    This dataset contains the full data from the original Salesforce/xlam-function-calling-60k

    Subset name multi-turn parallel multiple definition Last turn type number of dataset

    xlam-function-calling-60k no yes yes tool_calls 60000

    This is a re-parsing formatting dataset for the xLAM official dataset.

      Load the dataset
    

    from datasets import load_dataset

    ds =… See the full description on the dataset page: https://huggingface.co/datasets/minpeter/xlam-function-calling-60k-parsed.

  17. h

    ToolACE-sharegpt

    • huggingface.co
    Updated Aug 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boyao Wang (2024). ToolACE-sharegpt [Dataset]. https://huggingface.co/datasets/Beryex/ToolACE-sharegpt
    Explore at:
    Dataset updated
    Aug 23, 2024
    Authors
    Boyao Wang
    Description

    Dataset Card for APIGen-MT-5k-sharegpt

    This dataset is the sharegpt format of the original Team-ACE/ToolACE dataset. It is primarily designed for fine-tuning large language models (LLMs) for function calling and multi-turn conversations.

      Dataset Description
    

    The original Team-ACE/ToolACE dataset contains conversations between users and a language model, focusing on API usage and tool invocation scenarios. We have converted this dataset into the ShareGPT format, which is a… See the full description on the dataset page: https://huggingface.co/datasets/Beryex/ToolACE-sharegpt.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Argilla, apigen-function-calling [Dataset]. https://huggingface.co/datasets/argilla/apigen-function-calling

apigen-function-calling

argilla/apigen-function-calling

Explore at:
5 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Argilla
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset card for argilla/apigen-function-calling

This dataset is a merge of argilla/Synth-APIGen-v0.1 and Salesforce/xlam-function-calling-60k, making over 100K function calling examples following the APIGen recipe.

  Prepare for training

This version is not ready to do fine tuning, but you can run a script like prepare_for_sft.py to prepare it, and run the same recipe that can be found in argilla/Llama-3.2-1B-Instruct-APIGen-FC-v0.1#training-procedure. Modify the prompt… See the full description on the dataset page: https://huggingface.co/datasets/argilla/apigen-function-calling.

Search
Clear search
Close search
Google apps
Main menu