Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is mirrored on Hugging Face https://huggingface.co/datasets/aspear/saferdecoding-fine-tuning/blob/main/README.md
This dataset aims to fine-tune models in an attempt to defend against jailbreak attacks. It is an extension of SafeDecoding
This dataset includes 252 original human-generated adversarial seed prompts, covering 18 harmful categories.
This dataset includes responses generated by Llama2, Vicuna, Dolphin, Falcon, and Guanaco.
Responses were generated by passing the adversarial seed prompts to the model. Only responses that reject the request were recorded.
Facebook
TwitterDataset Card for "llama2-sst2-finetuning"
Dataset Description
The Llama2-sst2-fine-tuning dataset is designed for supervised fine-tuning of the LLaMA V2 based on the GLUE SST2 for sentiment analysis classification task.We provide two subsets: training and validation.To ensure the effectiveness of fine-tuning, we convert the data into the prompt template for LLaMA V2 supervised fine-tuning, where the data will follow this format:
[INST] <
Facebook
TwitterDataset for fine-tuning gemma-3-1b-it for function calling. The code and other resources for this project are linked below. Resources:
YouTube Video Blog Post GitHub Repo Fine-tuned Model | Original Model
Citation
If you find this dataset helpful, please cite: @dataset{talebi2025, author = {Shaw Talebi}, title = {tool-use-finetuning}, year = {2025}, publisher = {Hugging Face}, howpublished =… See the full description on the dataset page: https://huggingface.co/datasets/shawhin/tool-use-finetuning.
Facebook
TwitterContainer dataset for demonstration of Hugging Face models on Redivis. Currently just contains a single BERT model, but may expand in the future.
Facebook
TwitterDataset Card for llama-2-banking-fine-tune
This dataset has been created with Argilla. As shown in the sections below, this dataset can be loaded into Argilla as explained in Load with Argilla, or used directly with the datasets library in Load with datasets.
Dataset Summary
This dataset contains:
A dataset configuration file conforming to the Argilla dataset format named argilla.yaml. This configuration file will be used to configure the dataset when using the… See the full description on the dataset page: https://huggingface.co/datasets/argilla/llama-2-banking-fine-tune.
Facebook
Twitterpavelmarcolian/wayfinder-fine-tuning-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset contains conversation summaries, topics, and dialogues used to create the pipeline of fine tunning the LLM model using Parameter Efficient Fine Tunning and Low-Rank Adaptation of Large Language Models) which is a popular and lightweight training technique that significantly reduces the number of trainable parameters.
The dataset is also available in the hugging face. https://huggingface.co/datasets/knkarthick/dialogsum
Facebook
TwitterMihirsinhChauhan/LLAVA-fine-tune-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
Alpaca is the perfect dataset for fine-tuning your language models to better understand and follow instructions, capable of taking you beyond standard Natural Language Processing (NLP) abilities! This curated, cleaned dataset provides you with over 52,000 expertly crafted instructions and demonstrations generated by OpenAI's text-davinci-003 engine - all in English (BCP-47 en). Improve the quality of your language models with fields such as instruction, output, and input which have been designed to enhance every aspect of their comprehension. The data here has gone through rigorous cleaning to ensure there are no errors or biases present; allowing you to trust that this data will result in improved performance for any language model that uses it! Get ready to see what Alpaca can do for your NLP needs
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides a unique and valuable resource for anyone who wishes to create, develop and train language models. Alpaca provides users with 52,000 instruction-demonstration pairs generated by OpenAI's text-davinci-003 engine.
The data included in this dataset is formatted into 3 columns: “instruction”, “output” and “input.” All the data is written in English (BCP-47 en).
To make the most out of this dataset it is recommended to:
Familiarize yourself with the instructions in the instruction column as these provide guidance on how to use the other two columns; input and output.
Once comfortable with understanding the instructions columns move onto exploring what you are provided within each 14 sets of triplets – instruction, output and input – that are included in this clean version of Alpaca.
Read through many examples paying attention to any areas where you feel more clarification could be added or could be further improved upon for better understanding of language models however bear in mind that these examples have been cleaned from any errors or biases found from original dataset
Get inspired! As mentioned earlier there are more than 52k sets provided meaning having much flexibility for varying training strategies or unique approaches when creating your own language model!
Finally while not essential it may be helpful to have familiarity with OpenAI's text-davinci engine as well as enjoy playing around with different parameters/options depending on what type of outcomes you wish achieve
- Developing natural language processing (NLP) tasks that aim to better automate and interpret instructions given by humans.
- Training machine learning models of robotic agents to be able to understand natural language commands, as well as understand the correct action that needs to be taken in response.
- Creating a system that can generate personalized instructions and feedback in real time based on language models, catering specifically to each individual user's preferences or needs
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: train.csv | Column name | Description | |:----------------|:-------------------------------------------------------------------------| | instruction | This column contains the instructions for the language model. (Text) | | output | This column contains the expected output from the language model. (Text) | | input | This column contains the input given to the language model. (Text) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
Facebook
Twitterkielerrr/fine-tuning-docs dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterch-min/Fine-tuning-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThis dataset is taken from code_instructions_120k, which adds a prompt column in alpaca style. Refer to the original source here. https://huggingface.co/datasets/iamtarun/code_instructions_120k_alpaca
Facebook
TwitterThis Install Package for LLM RAG, fine tuning essential library such as ( HuggingFace hub , transformer, langchain , evalate, sentence-transformers and etc. ) , suitable for Kaggle competition (offline) requirement which download form kaggle development environment.
Support Package list as below:
transformer
datasets
accelerate
bitsandbytes
langchain
langchain-community
sentence-transformers
chromadb
faiss-cpu
huggingface_hub
langchain-text-splitters
peft
trl
umap-learn
evaluate
deepeval
weave
Suggestion install command in kaggle: !pip install transformers --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/tranformers !pip install -U datasets --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/datasets !pip install -U accelerate --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/accelerate !pip install build --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/build-1.2.1-py3-none-any.whl !pip install -U bitsandbytes --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/bitsandbytes-0.43.1-py3-none-manylinux_2_24_x86_64.whl !pip install langchain --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/langchain-0.2.5-py3-none-any.whl !pip install langchain-core --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/langchain_core-0.2.9-py3-none-any.whl !pip install langsmith --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/langsmith-0.1.81-py3-none-any.whl !pip install langchain-community --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/langchain_community-0.2.5-py3-none-any.whl !pip install sentence-transformers --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/sentence_transformers-3.0.1-py3-none-any.whl !pip install chromadb --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/chromadb-0.5.3-py3-none-any.whl !pip install faiss-cpu --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/faiss_cpu-1.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl !pip install -U huggingface_hub --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/huggingface_hub !pip install -qU langchain-text-splitters --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/langchain_text_splitters-0.2.1-py3-none-any.whl !pip install -U peft --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/peft-0.11.1-py3-none-any.whl !pip install -U trl --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/trl-0.9.4-py3-none-any.whl !pip install umap-learn --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/umap-learn !pip install evaluate --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/evaluate-0.4.2-py3-none-any.whl !pip install deepeval --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/deepeval-0.21.59-py3-none-any.whl !pip install weave --no-index --no-deps --find-links=file:///kaggle/input/ai-math-llm-package/download-package/weave-0.50.2-py3-none-any.whl
Facebook
Twitter
According to our latest research, the global parameter-efficient fine-tuning tools market size reached USD 1.42 billion in 2024, reflecting the rapid adoption of advanced AI model customization techniques. The market is poised for robust expansion, with a projected CAGR of 24.7% during the forecast period. By 2033, the market is expected to attain a value of USD 11.6 billion, driven by the increasing demand for scalable, cost-effective, and resource-efficient AI solutions across diverse industries. This impressive growth trajectory is underpinned by the proliferation of large language models, the need for efficient model adaptation, and the surging adoption of AI-driven automation in enterprise applications.
The most significant growth factor for the parameter-efficient fine-tuning tools market is the exponential rise in the deployment of large-scale AI models, particularly in natural language processing (NLP), computer vision, and speech recognition. These models, such as GPT, BERT, and their derivatives, require substantial computational resources for training and fine-tuning. Traditional full-model fine-tuning methods are often resource-intensive, expensive, and impractical for organizations with limited computational infrastructure. Parameter-efficient fine-tuning techniques—such as adapters, LoRA, and prompt tuning—address these challenges by enabling targeted updates to a small subset of model parameters, drastically reducing hardware requirements and operational costs. This efficiency has democratized access to advanced AI capabilities, empowering a broader spectrum of businesses and research institutes to leverage state-of-the-art machine learning without incurring prohibitive expenses.
Another major driver fueling the parameter-efficient fine-tuning tools market is the growing need for rapid model customization in dynamic and regulated industries. Enterprises in sectors such as healthcare, finance, and government are increasingly seeking AI solutions that can be quickly adapted to evolving data, compliance requirements, and domain-specific nuances. Parameter-efficient fine-tuning tools allow organizations to update models in a fraction of the time compared to traditional methods, accelerating time-to-market for AI-powered applications. Furthermore, these tools support the preservation of model privacy and security, as they enable fine-tuning on-premises or within secure cloud environments without exposing sensitive data to external parties. This capability is particularly crucial for organizations handling confidential or regulated information, making parameter-efficient fine-tuning an indispensable component of modern AI workflows.
The surge in open-source innovation and collaboration is also catalyzing market growth. The AI research community has made remarkable strides in developing and sharing parameter-efficient fine-tuning frameworks, libraries, and benchmarks. This collaborative ecosystem has lowered the barrier to entry for developers and enterprises seeking to experiment with and deploy advanced fine-tuning techniques. Open-source tools such as Hugging Face Transformers, PEFT (Parameter-Efficient Fine-Tuning) libraries, and community-driven repositories have become integral to the AI development lifecycle. These resources not only accelerate innovation but also foster interoperability and standardization across the industry, further propelling the adoption of parameter-efficient fine-tuning tools globally.
From a regional perspective, North America continues to dominate the parameter-efficient fine-tuning tools market, accounting for the largest share in 2024. This leadership is attributed to the region's advanced AI research ecosystem, significant investments in AI infrastructure, and early adoption by technology giants and innovative startups. Europe and Asia Pacific are also emerging as key growth regions, driven by increasing government initiatives, academic research, and enterprise digital transformation efforts. The Asia Pacific region, in particular, is witnessing rapid adoption of AI technologies across sectors such as manufacturing, e-commerce, and telecommunications, which is expected to drive substantial market growth throughout the forecast period.
Facebook
Twittertrainfarren/fine-tuning-datasets dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterfetost/fine-tuning dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterDataset for fine-tuning an embedding model for AI job search. Data sourced from datastax/linkedin_job_listings. Data used to fine-tune shawhin/distilroberta-ai-job-embeddings for AI job search. Links
GitHub Repo Video link Blog link
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for SaferDecoding Fine Tuning Dataset
This dataset aims to fine-tune models in an attempt to defend against jailbreak attacks. It is an extension of SafeDecoding
Dataset Details
Dataset Description
The dataset generation process was adapted from SafeDecoding. This dataset includes 252 original human-generated adversarial seed prompts, covering 18 harmful categories. This dataset includes responses generated by Llama2, Vicuna, Dolphin, Falcon… See the full description on the dataset page: https://huggingface.co/datasets/aspear/saferdecoding-fine-tuning.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
heba1998/real-estate-data-for-llm-fine-tuning dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
MrFacewhythisnameexists/llama3-fine-tuning dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is mirrored on Hugging Face https://huggingface.co/datasets/aspear/saferdecoding-fine-tuning/blob/main/README.md
This dataset aims to fine-tune models in an attempt to defend against jailbreak attacks. It is an extension of SafeDecoding
This dataset includes 252 original human-generated adversarial seed prompts, covering 18 harmful categories.
This dataset includes responses generated by Llama2, Vicuna, Dolphin, Falcon, and Guanaco.
Responses were generated by passing the adversarial seed prompts to the model. Only responses that reject the request were recorded.