Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset card for argilla/apigen-function-calling
This dataset is a merge of argilla/Synth-APIGen-v0.1 and Salesforce/xlam-function-calling-60k, making over 100K function calling examples following the APIGen recipe.
Prepare for training
This version is not ready to do fine tuning, but you can run a script like prepare_for_sft.py to prepare it, and run the same recipe that can be found in argilla/Llama-3.2-1B-Instruct-APIGen-FC-v0.1#training-procedure. Modify the prompt… See the full description on the dataset page: https://huggingface.co/datasets/argilla/apigen-function-calling.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Summary
APIGen-MT is an automated agentic data generation pipeline designed to synthesize verifiable, high-quality, realistic datasets for agentic applications This dataset was released as part of APIGen-MT: Agentic PIpeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay The repo contains 5000 multi-turn trajectories collected by APIGen-MT This dataset is a subset of the data used to train the xLAM-2 model series
Overview
Agentic data consists of… See the full description on the dataset page: https://huggingface.co/datasets/Salesforce/APIGen-MT-5k.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset card for Synth-APIGen-v0.1
This dataset has been created with distilabel. Pipeline script: pipeline_apigen_train.py.
Dataset creation
It has been created with distilabel==1.4.0 version. This dataset is an implementation of APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets in distilabel, generated from synthetic functions. The process can be summarized as follows:
Generate (or in this case modify) python… See the full description on the dataset page: https://huggingface.co/datasets/argilla/Synth-APIGen-v0.1.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for argilla-warehouse/synth-apigen-llama
This dataset has been created with distilabel. The pipeline script was uploaded to easily reproduce the dataset: synth_apigen.py.
Dataset creation
This dataset is a replica in distilabel of the framework defined in: APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets. Using the seed dataset of synthetic python functions in argilla-warehouse/python-seed-tools, the… See the full description on the dataset page: https://huggingface.co/datasets/argilla-warehouse/synth-apigen-llama.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for argilla-warehouse/synth-apigen-qwen
This dataset has been created with distilabel. The pipeline script was uploaded to easily reproduce the dataset: synth_apigen.py.
Dataset creation
This dataset is a replica in distilabel of the framework defined in: APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets. Using the seed dataset of synthetic python functions in argilla-warehouse/python-seed-tools, the… See the full description on the dataset page: https://huggingface.co/datasets/argilla-warehouse/synth-apigen-qwen.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset card
This dataset is a version of argilla/Synth-APIGen-v0.1 prepared for fine-tuning using trl. To generate it, the following script was run: from datasets import load_dataset from jinja2 import Template
SYSTEM_PROMPT = """ You are an expert in composing functions. You are given a question and a set of possible functions. Based on the question, you will need to make one or more function/tool calls to achieve the purpose. If none of the functions can be used, point it out… See the full description on the dataset page: https://huggingface.co/datasets/argilla-warehouse/apigen-synth-trl.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
[PARSED] APIGen-MT-5k
The data in this dataset is a full of the original Salesforce/APIGen-MT-5k
Subset name multi-turn parallel multiple definition Last turn type number of dataset
apigen-mt-5k yes no yes complex 5k
This is a re-parsing formatting dataset for the APIGen-MT-5k official dataset.
Load the dataset
from datasets import load_dataset
ds = load_dataset("minpeter/apigen-mt-5k-parsed") print(ds)
Dataset Card for APIGen-MT-5k-sharegpt
This dataset is the sharegpt format of the original Salesforce/APIGen-MT-5k dataset. It is primarily designed for fine-tuning large language models (LLMs) for function calling and multi-turn conversations.
Dataset Description
The original Salesforce/APIGen-MT-5k dataset contains conversations between users and a language model, focusing on API usage and tool invocation scenarios. We have converted this dataset into the ShareGPT… See the full description on the dataset page: https://huggingface.co/datasets/Beryex/APIGen-MT-5k-sharegpt.
minyichen/APIGen-MT-46k dataset hosted on Hugging Face and contributed by the HF Datasets community
PocketDoc/Dans-Toolmaxx-Functions-apigen-subset dataset hosted on Hugging Face and contributed by the HF Datasets community
dipta007/APIGen-MT-5k-with-cot dataset hosted on Hugging Face and contributed by the HF Datasets community
dipta007/APIGen-MT-5k-with-cot-v1-deepseek_deepseek dataset hosted on Hugging Face and contributed by the HF Datasets community
qiukingballball/APIGen-MT-5k-modified dataset hosted on Hugging Face and contributed by the HF Datasets community
dipta007/APIGen-MT-5k-with-think dataset hosted on Hugging Face and contributed by the HF Datasets community
Used in axolotl
datasets: - path: minpeter/apigen-mt-5k-friendli data_files: - train.jsonl - test.jsonl type: chat_template roles_to_train: ["assistant"] field_messages: messages message_property_mappings: role: role content: contentchat_template: tokenizer_default
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
[PARSED] APIGen Function-Calling Datasets (xLAM)
This dataset contains the full data from the original Salesforce/xlam-function-calling-60k
Subset name multi-turn parallel multiple definition Last turn type number of dataset
xlam-function-calling-60k no yes yes tool_calls 60000
This is a re-parsing formatting dataset for the xLAM official dataset.
Load the dataset
from datasets import load_dataset
ds =… See the full description on the dataset page: https://huggingface.co/datasets/minpeter/xlam-function-calling-60k-parsed.
Dataset Card for APIGen-MT-5k-sharegpt
This dataset is the sharegpt format of the original Team-ACE/ToolACE dataset. It is primarily designed for fine-tuning large language models (LLMs) for function calling and multi-turn conversations.
Dataset Description
The original Team-ACE/ToolACE dataset contains conversations between users and a language model, focusing on API usage and tool invocation scenarios. We have converted this dataset into the ShareGPT format, which is a… See the full description on the dataset page: https://huggingface.co/datasets/Beryex/ToolACE-sharegpt.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset card for argilla/apigen-function-calling
This dataset is a merge of argilla/Synth-APIGen-v0.1 and Salesforce/xlam-function-calling-60k, making over 100K function calling examples following the APIGen recipe.
Prepare for training
This version is not ready to do fine tuning, but you can run a script like prepare_for_sft.py to prepare it, and run the same recipe that can be found in argilla/Llama-3.2-1B-Instruct-APIGen-FC-v0.1#training-procedure. Modify the prompt… See the full description on the dataset page: https://huggingface.co/datasets/argilla/apigen-function-calling.