14 datasets found

h
infinity-hw-mistral-finetune-8192
huggingface.co
Updated Jul 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhupendra Hada (2024). infinity-hw-mistral-finetune-8192 [Dataset]. https://huggingface.co/datasets/bshada/infinity-hw-mistral-finetune-8192
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 20, 2024
Authors
Bhupendra Hada
Description
bshada/infinity-hw-mistral-finetune-8192 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
dataset-Mistral-7B-v0.1-finetune-20250716-151443
huggingface.co
Updated Jul 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Devesh Paragiri (2025). dataset-Mistral-7B-v0.1-finetune-20250716-151443 [Dataset]. https://huggingface.co/datasets/deveshparagiri/dataset-Mistral-7B-v0.1-finetune-20250716-151443
Explore at:
Dataset updated
Jul 16, 2025
Authors
Devesh Paragiri
Description
deveshparagiri/dataset-Mistral-7B-v0.1-finetune-20250716-151443 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
mistral-fine-tune-dataset
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
jagrit acharya, mistral-fine-tune-dataset [Dataset]. https://huggingface.co/datasets/jag2023/mistral-fine-tune-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
jagrit acharya
Description
jag2023/mistral-fine-tune-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
mistral-7b-v0.1-GreeceRome-v0.1
huggingface.co
Updated Feb 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
William Marcellino (2024). mistral-7b-v0.1-GreeceRome-v0.1 [Dataset]. https://huggingface.co/datasets/wmmarcellino/mistral-7b-v0.1-GreeceRome-v0.1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 3, 2024
Authors
William Marcellino
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
A classics data set for use with mistral-7b-v0.1

This dataset was used for a fine-tune of Mistral 7b base model. It contains 1,640 Q/A pairs on Greek & Roman history. The dataset was generated via Mixtral-8x7b Instruct v01, run over 512 token-length chunks of vol's 2&3 of Will Durants' 13 vol Story of Civilization (Life of Greece and Caesar & Christ). Training data was formatted with [INST] and [/INST] delimiting instructions: {"text": "Q: "Why did many Greeks come to resent Rome's… See the full description on the dataset page: https://huggingface.co/datasets/wmmarcellino/mistral-7b-v0.1-GreeceRome-v0.1.
h
function_calling_finetune_mistral
huggingface.co
Updated Jun 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agustín Piqueres Lajarín (2024). function_calling_finetune_mistral [Dataset]. https://huggingface.co/datasets/plaguss/function_calling_finetune_mistral
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 19, 2024
Authors
Agustín Piqueres Lajarín
Description
Description

This dataset is a modified version of Locutusque/function-calling-chatml where the messages were obtained by applying the scripts in mistral-finetune function calling guide. Only the messages that contain tool entries have been kept, so this dataset could be used for SFT following the Mistral Finetune guide.

Example

{ "system_message": "You are a helpful assistant with access to the following functions. Use them if required -"… See the full description on the dataset page: https://huggingface.co/datasets/plaguss/function_calling_finetune_mistral.
h
lj_speech_with_spectogram_conversations
huggingface.co
Updated Apr 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ram (2024). lj_speech_with_spectogram_conversations [Dataset]. https://huggingface.co/datasets/0-hero/lj_speech_with_spectogram_conversations
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 5, 2024
Authors
Ram
Description
Explanation

A small experiment insipred by the Mistral playing DOOM experiment from the Mistral Hackathon How it works? Audio -> Waveform Visualization -> Waveform ASCII Art -> Finetune Mistral on ASCII Art to predict text from ASCII Art

Quick video explanation

Example Waveform Example ASCII Art… See the full description on the dataset page: https://huggingface.co/datasets/0-hero/lj_speech_with_spectogram_conversations.
h
Nemo-Base-V7-Tekken
huggingface.co
Updated Aug 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New Eden (2025). Nemo-Base-V7-Tekken [Dataset]. https://huggingface.co/datasets/NewEden/Nemo-Base-V7-Tekken
Explore at:
Dataset updated
Aug 9, 2025
Dataset authored and provided by
New Eden
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory via Unsloth!

We have a free Google Colab Tesla T4 notebook for Mistral Nemo 12b here: https://colab.research.google.com/drive/17d3U-CAIwzmbDRqbZ9NnpHxCkmXB6LZ0?usp=sharing

✨ Finetune for Free

All notebooks are beginner friendly! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.

Unsloth supports Free Notebooks… See the full description on the dataset page: https://huggingface.co/datasets/NewEden/Nemo-Base-V7-Tekken.
h
mistral-finetune-instruct-digistar-telkom
huggingface.co
Updated Apr 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Revalda Putawara (2024). mistral-finetune-instruct-digistar-telkom [Dataset]. https://huggingface.co/datasets/putawararevalda/mistral-finetune-instruct-digistar-telkom
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 26, 2024
Authors
Revalda Putawara
Description
putawararevalda/mistral-finetune-instruct-digistar-telkom dataset hosted on Hugging Face and contributed by the HF Datasets community
h
medical_mistral_instruct_dataset_short
huggingface.co
Updated Apr 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bojan Jakimovski (2024). medical_mistral_instruct_dataset_short [Dataset]. https://huggingface.co/datasets/Shekswess/medical_mistral_instruct_dataset_short
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 14, 2024
Authors
Bojan Jakimovski
Description
Dataset made for instruction supervised finetuning of Mistral LLMs, by combining of medical datasets and getting 2k entries from them:

Medical meadow wikidoc (https://huggingface.co/datasets/medalpaca/medical_meadow_wikidoc/blob/main/README.md) Medquad (https://www.kaggle.com/datasets/jpmiller/layoutlm)

Medical meadow wikidoc

The Medical Meadow Wikidoc dataset comprises question-answer pairs sourced from WikiDoc, an online platform where medical professionals collaboratively… See the full description on the dataset page: https://huggingface.co/datasets/Shekswess/medical_mistral_instruct_dataset_short.
h
sql-subset-dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yulin G, sql-subset-dataset [Dataset]. https://huggingface.co/datasets/yulingai12/sql-subset-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Yulin G
Description
Description

A small subset of dataset b-mc2/sql-create-context, used to fine-tune yulingai12/Mistral-7B-Instruct-v0.2-SQL-finetuned-test
h
aLLoyM-dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuna Oikawa, aLLoyM-dataset [Dataset]. https://huggingface.co/datasets/Playingyoyo/aLLoyM-dataset
Explore at:
Authors
Yuna Oikawa
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
aLLoyM Training Dataset

This dataset was used to fine-tune the aLLoyM model (Mistral-based).

Dataset Statistics

Total Examples: 2,035,791 Average Instruction Length: 113.1 characters Average Question Length: 76.4 characters
Average Answer Length: 33.9 characters Total Dataset Size: 668.55 MB

Format

The dataset is in JSONL format where each line contains: { "messages": [ {"role": "system", "content": "System instruction"}, {"role": "user"… See the full description on the dataset page: https://huggingface.co/datasets/Playingyoyo/aLLoyM-dataset.
h
Bitext-travel-llm-chatbot-training-dataset
huggingface.co
Updated Jun 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2025). Bitext-travel-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-travel-llm-chatbot-training-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 21, 2025
Dataset authored and provided by
Bitext
License
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Description
Bitext - Travel Tagged Training Dataset for LLM-based Virtual Assistants

Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Travel] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An overview of… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-travel-llm-chatbot-training-dataset.
h
phi4-conversations
huggingface.co
Updated Mar 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RDSM (2025). phi4-conversations [Dataset]. https://huggingface.co/datasets/rdsm/phi4-conversations
Explore at:
Dataset updated
Mar 30, 2025
Authors
RDSM
Description
Raw responses generated by Phi4 , questions from alamios/Mistral-Small-24B-Instruct-2501-Conversations Made it to use on the QwenPhi 0.5B Draft model, but the finetune did not yield much improvement, still I have generated the dataset so here is the raw data hopefully it is useful for someone.
h
Bitext-restaurants-llm-chatbot-training-dataset
huggingface.co
Updated Aug 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2024). Bitext-restaurants-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-restaurants-llm-chatbot-training-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 16, 2024
Dataset authored and provided by
Bitext
License
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Description
Bitext - Restaurants Tagged Training Dataset for LLM-based Virtual Assistants

Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [restaurants] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-restaurants-llm-chatbot-training-dataset.
Not seeing a result you expected?
Learn how you can add new datasets to our index.