14 datasets found
  1. h

    infinity-hw-mistral-finetune-8192

    • huggingface.co
    Updated Jul 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhupendra Hada (2024). infinity-hw-mistral-finetune-8192 [Dataset]. https://huggingface.co/datasets/bshada/infinity-hw-mistral-finetune-8192
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 20, 2024
    Authors
    Bhupendra Hada
    Description

    bshada/infinity-hw-mistral-finetune-8192 dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. h

    dataset-Mistral-7B-v0.1-finetune-20250716-151443

    • huggingface.co
    Updated Jul 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Devesh Paragiri (2025). dataset-Mistral-7B-v0.1-finetune-20250716-151443 [Dataset]. https://huggingface.co/datasets/deveshparagiri/dataset-Mistral-7B-v0.1-finetune-20250716-151443
    Explore at:
    Dataset updated
    Jul 16, 2025
    Authors
    Devesh Paragiri
    Description

    deveshparagiri/dataset-Mistral-7B-v0.1-finetune-20250716-151443 dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    mistral-fine-tune-dataset

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jagrit acharya, mistral-fine-tune-dataset [Dataset]. https://huggingface.co/datasets/jag2023/mistral-fine-tune-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    jagrit acharya
    Description

    jag2023/mistral-fine-tune-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    mistral-7b-v0.1-GreeceRome-v0.1

    • huggingface.co
    Updated Feb 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Marcellino (2024). mistral-7b-v0.1-GreeceRome-v0.1 [Dataset]. https://huggingface.co/datasets/wmmarcellino/mistral-7b-v0.1-GreeceRome-v0.1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 3, 2024
    Authors
    William Marcellino
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    A classics data set for use with mistral-7b-v0.1

    This dataset was used for a fine-tune of Mistral 7b base model. It contains 1,640 Q/A pairs on Greek & Roman history. The dataset was generated via Mixtral-8x7b Instruct v01, run over 512 token-length chunks of vol's 2&3 of Will Durants' 13 vol Story of Civilization (Life of Greece and Caesar & Christ). Training data was formatted with [INST] and [/INST] delimiting instructions: {"text": "Q: "Why did many Greeks come to resent Rome's… See the full description on the dataset page: https://huggingface.co/datasets/wmmarcellino/mistral-7b-v0.1-GreeceRome-v0.1.

  5. h

    function_calling_finetune_mistral

    • huggingface.co
    Updated Jun 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agustín Piqueres Lajarín (2024). function_calling_finetune_mistral [Dataset]. https://huggingface.co/datasets/plaguss/function_calling_finetune_mistral
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 19, 2024
    Authors
    Agustín Piqueres Lajarín
    Description

    Description

    This dataset is a modified version of Locutusque/function-calling-chatml where the messages were obtained by applying the scripts in mistral-finetune function calling guide. Only the messages that contain tool entries have been kept, so this dataset could be used for SFT following the Mistral Finetune guide.

      Example
    

    { "system_message": "You are a helpful assistant with access to the following functions. Use them if required -"… See the full description on the dataset page: https://huggingface.co/datasets/plaguss/function_calling_finetune_mistral.

  6. h

    lj_speech_with_spectogram_conversations

    • huggingface.co
    Updated Apr 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ram (2024). lj_speech_with_spectogram_conversations [Dataset]. https://huggingface.co/datasets/0-hero/lj_speech_with_spectogram_conversations
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 5, 2024
    Authors
    Ram
    Description

    Explanation

    A small experiment insipred by the Mistral playing DOOM experiment from the Mistral Hackathon How it works? Audio -> Waveform Visualization -> Waveform ASCII Art -> Finetune Mistral on ASCII Art to predict text from ASCII Art

    Quick video explanation

      Example Waveform
    
    
    
    
    
    
    
      Example ASCII Art… See the full description on the dataset page: https://huggingface.co/datasets/0-hero/lj_speech_with_spectogram_conversations.
    
  7. h

    Nemo-Base-V7-Tekken

    • huggingface.co
    Updated Aug 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New Eden (2025). Nemo-Base-V7-Tekken [Dataset]. https://huggingface.co/datasets/NewEden/Nemo-Base-V7-Tekken
    Explore at:
    Dataset updated
    Aug 9, 2025
    Dataset authored and provided by
    New Eden
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory via Unsloth!

    We have a free Google Colab Tesla T4 notebook for Mistral Nemo 12b here: https://colab.research.google.com/drive/17d3U-CAIwzmbDRqbZ9NnpHxCkmXB6LZ0?usp=sharing

      ✨ Finetune for Free
    

    All notebooks are beginner friendly! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.

    Unsloth supports Free Notebooks… See the full description on the dataset page: https://huggingface.co/datasets/NewEden/Nemo-Base-V7-Tekken.

  8. h

    mistral-finetune-instruct-digistar-telkom

    • huggingface.co
    Updated Apr 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Revalda Putawara (2024). mistral-finetune-instruct-digistar-telkom [Dataset]. https://huggingface.co/datasets/putawararevalda/mistral-finetune-instruct-digistar-telkom
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 26, 2024
    Authors
    Revalda Putawara
    Description

    putawararevalda/mistral-finetune-instruct-digistar-telkom dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    medical_mistral_instruct_dataset_short

    • huggingface.co
    Updated Apr 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bojan Jakimovski (2024). medical_mistral_instruct_dataset_short [Dataset]. https://huggingface.co/datasets/Shekswess/medical_mistral_instruct_dataset_short
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 14, 2024
    Authors
    Bojan Jakimovski
    Description

    Dataset made for instruction supervised finetuning of Mistral LLMs, by combining of medical datasets and getting 2k entries from them:

    Medical meadow wikidoc (https://huggingface.co/datasets/medalpaca/medical_meadow_wikidoc/blob/main/README.md) Medquad (https://www.kaggle.com/datasets/jpmiller/layoutlm)

      Medical meadow wikidoc
    

    The Medical Meadow Wikidoc dataset comprises question-answer pairs sourced from WikiDoc, an online platform where medical professionals collaboratively… See the full description on the dataset page: https://huggingface.co/datasets/Shekswess/medical_mistral_instruct_dataset_short.

  10. h

    sql-subset-dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yulin G, sql-subset-dataset [Dataset]. https://huggingface.co/datasets/yulingai12/sql-subset-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Yulin G
    Description

    Description

    A small subset of dataset b-mc2/sql-create-context, used to fine-tune yulingai12/Mistral-7B-Instruct-v0.2-SQL-finetuned-test

  11. h

    aLLoyM-dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuna Oikawa, aLLoyM-dataset [Dataset]. https://huggingface.co/datasets/Playingyoyo/aLLoyM-dataset
    Explore at:
    Authors
    Yuna Oikawa
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    aLLoyM Training Dataset

    This dataset was used to fine-tune the aLLoyM model (Mistral-based).

      Dataset Statistics
    

    Total Examples: 2,035,791 Average Instruction Length: 113.1 characters Average Question Length: 76.4 characters
    Average Answer Length: 33.9 characters Total Dataset Size: 668.55 MB

      Format
    

    The dataset is in JSONL format where each line contains: { "messages": [ {"role": "system", "content": "System instruction"}, {"role": "user"… See the full description on the dataset page: https://huggingface.co/datasets/Playingyoyo/aLLoyM-dataset.

  12. h

    Bitext-travel-llm-chatbot-training-dataset

    • huggingface.co
    Updated Jun 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2025). Bitext-travel-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-travel-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 21, 2025
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Travel Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Travel] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An overview of… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-travel-llm-chatbot-training-dataset.

  13. h

    phi4-conversations

    • huggingface.co
    Updated Mar 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RDSM (2025). phi4-conversations [Dataset]. https://huggingface.co/datasets/rdsm/phi4-conversations
    Explore at:
    Dataset updated
    Mar 30, 2025
    Authors
    RDSM
    Description

    Raw responses generated by Phi4 , questions from alamios/Mistral-Small-24B-Instruct-2501-Conversations Made it to use on the QwenPhi 0.5B Draft model, but the finetune did not yield much improvement, still I have generated the dataset so here is the raw data hopefully it is useful for someone.

  14. h

    Bitext-restaurants-llm-chatbot-training-dataset

    • huggingface.co
    Updated Aug 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext-restaurants-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-restaurants-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 16, 2024
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Restaurants Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [restaurants] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-restaurants-llm-chatbot-training-dataset.

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bhupendra Hada (2024). infinity-hw-mistral-finetune-8192 [Dataset]. https://huggingface.co/datasets/bshada/infinity-hw-mistral-finetune-8192

infinity-hw-mistral-finetune-8192

bshada/infinity-hw-mistral-finetune-8192

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 20, 2024
Authors
Bhupendra Hada
Description

bshada/infinity-hw-mistral-finetune-8192 dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu