100+ datasets found
  1. h

    llama2-sst2-fine-tuning

    • huggingface.co
    Updated Aug 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yifei (2023). llama2-sst2-fine-tuning [Dataset]. https://huggingface.co/datasets/OneFly7/llama2-sst2-fine-tuning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 2, 2023
    Authors
    Yifei
    Description

    Dataset Card for "llama2-sst2-finetuning"

      Dataset Description
    

    The Llama2-sst2-fine-tuning dataset is designed for supervised fine-tuning of the LLaMA V2 based on the GLUE SST2 for sentiment analysis classification task.We provide two subsets: training and validation.To ensure the effectiveness of fine-tuning, we convert the data into the prompt template for LLaMA V2 supervised fine-tuning, where the data will follow this format:
    [INST] <

  2. h

    Prepared-Dataset-Fine-Tune-Llama-2

    • huggingface.co
    Updated Apr 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    moumad (2024). Prepared-Dataset-Fine-Tune-Llama-2 [Dataset]. https://huggingface.co/datasets/Hmoumad/Prepared-Dataset-Fine-Tune-Llama-2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 20, 2024
    Authors
    moumad
    Description

    Hmoumad/Prepared-Dataset-Fine-Tune-Llama-2 dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. r

    HuggingFace models

    • redivis.com
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). HuggingFace models [Dataset]. https://redivis.com/workflows/gxw9-5ey3j79zs
    Explore at:
    Dataset updated
    Feb 24, 2025
    Description

    Container dataset for demonstration of Hugging Face models on Redivis. Currently just contains a single BERT model, but may expand in the future.

  4. h

    BAAI_bge-small-en-v1_5-02082024-vrdv-webapp

    • huggingface.co
    Updated Aug 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fine-tuned Embeddings (2024). BAAI_bge-small-en-v1_5-02082024-vrdv-webapp [Dataset]. https://huggingface.co/datasets/fine-tuned/BAAI_bge-small-en-v1_5-02082024-vrdv-webapp
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 2, 2024
    Dataset authored and provided by
    Fine-tuned Embeddings
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    BAAI_bge-small-en-v1_5-02082024-vrdv-webapp Dataset

      Dataset Description
    

    The dataset "general domain" is a generated dataset designed to support the development of domain specific embedding models for retrieval tasks.

      Associated Model
    

    This dataset was used to train the BAAI_bge-small-en-v1_5-02082024-vrdv-webapp model.

      How to Use
    

    To use this dataset for model training or evaluation, you can load it using the Hugging Face datasets library as follows:… See the full description on the dataset page: https://huggingface.co/datasets/fine-tuned/BAAI_bge-small-en-v1_5-02082024-vrdv-webapp.

  5. r

    Yelp reviews (Hugging Face)

    • redivis.com
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Yelp reviews (Hugging Face) [Dataset]. https://redivis.com/workflows/gxw9-5ey3j79zs
    Explore at:
    Dataset updated
    Feb 24, 2025
    Description

    Dataset downloaded from Hugging Face, useful for training text classifier ML models.

  6. h

    Dermatology-Question-Answer-Dataset-For-Fine-Tuning

    • huggingface.co
    Updated Nov 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Areeb Khan (2023). Dermatology-Question-Answer-Dataset-For-Fine-Tuning [Dataset]. https://huggingface.co/datasets/Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2023
    Authors
    Muhammad Areeb Khan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Details

    The data set has about 1 Million Tokens for Training and about 1500 question answers.

      Dataset Description
    

    This dataset is a comprehensive compilation of questions related to dermatology, spanning inquiries about various skin diseases, their symptoms, recommended medications, and available treatment modalities. Each question is paired with a concise and informative response, making it an ideal resource for training and fine-tuning language models in the… See the full description on the dataset page: https://huggingface.co/datasets/Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning.

  7. chartllama-dataset

    • kaggle.com
    • huggingface.co
    Updated Oct 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexey Shevchenko (2024). chartllama-dataset [Dataset]. https://www.kaggle.com/datasets/alexeyshevchenko/chartllama-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 12, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alexey Shevchenko
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    https://raw.githubusercontent.com/tingxueronghua/ChartLlama-code/refs/heads/main/static/teaser_visualization_final_v3.png" alt="teaser_visualization_final_v3">

    A link to the original dataset located on HuggingFace: https://huggingface.co/datasets/listen2you002/ChartLlama-Dataset

    This dataset can be used to fine-tune Visual Language Models (VVM) for the Visual question answering (VQA) task (answering the question about graphs and diagrams)

    Table with examples of content

    modelconversationsidimage
    [ { "from": "human", "value": "
    What is the title of the chart?" }, { "from": "gpt", "value": "Analysis of smartphone usage patterns" } ]ours_simplified_qa_37_0ours/box_chart/png/box_chart_100examples_37.png
    [ { "from": "human", "value": "What are the outliers in the Microwave usage data?
    " }, { "from": "gpt", "value": "None" } ]ours_simplified_qa_56_2ours/box_chart/png/box_chart_100examples_56.png
    [ { "from": "human", "value": "What's the food consumption of USA in Year 2?
    " }, { "from": "gpt", "value": "200" } ]ours_simplified_qa_69_0ours/box_chart/png/box_chart_100examples_69.png
  8. jojos-dataset

    • huggingface.co
    Updated Mar 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Polytech Sorbonne X Hugging Face (2023). jojos-dataset [Dataset]. https://huggingface.co/datasets/polytechXhf/jojos-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2023
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Polytech Sorbonne X Hugging Face
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    More details will be added

  9. e

    Eesti RoBERTa mudel

    • data.europa.eu
    unknown
    Updated May 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eesti Keele Instituut (2024). Eesti RoBERTa mudel [Dataset]. https://data.europa.eu/data/datasets/oai-avaandmed-eesti-ee-daa73f99-3d3d-4056-9c79-0c2b721302ed?locale=bg
    Explore at:
    unknownAvailable download formats
    Dataset updated
    May 1, 2024
    Dataset authored and provided by
    Eesti Keele Instituut
    Area covered
    Estonia
    Description

    Est-RoBERTa is a monolingual Estonian RoBERTa-like language representation model. It was trained on Estonian corpora, containing mostly news articles, with 2.51 billion tokens in total.

    The model can be used for various NLP classification tasks by fine tuning the model end-to-end or alternatively by extracting the word embedding vectors for each word occurrence and using the vectors as input. The model vocabulary consists of 40,000 (subword) tokens. Any word not present in the vocabulary gets split into subword tokens, eg. "identification" might get split as "▁identif ic ation". The tokens that form the beginning of a word (or the whole word) have a special character (▁) prepended (that is not underscore character). Other tokens that form a non-beginning part of a word do not have any characters prepended or appended.

    The model configuration is in pytorch format, specifically for usage with transformers toolset by Huggingface (https://huggingface.co/transformers/), where it is also hosted already (https://huggingface.co/EMBEDDIA/est-roberta)

  10. h

    fine-tuning-dataset-mental-models-Llama3.1-8B

    • huggingface.co
    Updated Sep 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel P (2024). fine-tuning-dataset-mental-models-Llama3.1-8B [Dataset]. https://huggingface.co/datasets/Incorrect12321/fine-tuning-dataset-mental-models-Llama3.1-8B
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2024
    Authors
    Daniel P
    Description

    Incorrect12321/fine-tuning-dataset-mental-models-Llama3.1-8B dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. P

    SurgeGlobal/LaMini Dataset

    • paperswithcode.com
    Updated Apr 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chandeepa Dissanayake; Lahiru Lowe; Sachith Gunasekara; Yasiru Ratnayake (2024). SurgeGlobal/LaMini Dataset [Dataset]. https://paperswithcode.com/dataset/surgeglobal-lamini
    Explore at:
    Dataset updated
    Apr 17, 2024
    Authors
    Chandeepa Dissanayake; Lahiru Lowe; Sachith Gunasekara; Yasiru Ratnayake
    Description

    Overview The LaMini Dataset is an instruction dataset generated using h2ogpt-gm-oasst1-en-2048-falcon-40b-v2. It is designed for instruction-tuning pre-trained models to specialize them in a variety of downstream tasks.

    Dataset Generation

    Base Model: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2. Seed Instructions: Sourced from databricks/databricks-dolly-15k dataset. Generation Approach: Example-guided and topic-guided strategies. Total Instructions: 1,504 unique instruction examples.

    Dataset Sources

    Repository: Bitbucket Project Paper : Pre-Print

    Structure Each entry in the dataset contains: - Instruction - Response

    Usage The LaMini Dataset can be used to fine-tune language models to improve their ability to follow instructions and generate relevant responses.

    Access The dataset is available on HuggingFace at the following link: https://huggingface.co/datasets/SurgeGlobal/LaMini

    Citation If you find our work useful, please cite our paper as follows: @misc{surge2024openbezoar, title={OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data}, author={Chandeepa Dissanayake and Lahiru Lowe and Sachith Gunasekara and Yasiru Ratnayake}, year={2024}, eprint={2404.12195}, archivePrefix={arXiv}, primaryClass={cs.CL} }

    Dataset Authors Chandeepa Dissanayake, Lahiru Lowe, Sachith Gunasekara, and Yasiru Ratnayake

  12. h

    FEVER-256-24-gpt-4o-2024-05-13-989429

    • huggingface.co
    Updated May 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fine-tuned Embeddings (2024). FEVER-256-24-gpt-4o-2024-05-13-989429 [Dataset]. https://huggingface.co/datasets/fine-tuned/FEVER-256-24-gpt-4o-2024-05-13-989429
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 13, 2024
    Dataset authored and provided by
    Fine-tuned Embeddings
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    FEVER-256-24-gpt-4o-2024-05-13-989429 Dataset

      Dataset Description
    

    The dataset "dataset search for fact verification" is a generated dataset designed to support the development of domain specific embedding models for retrieval tasks.

      Associated Model
    

    This dataset was used to train the FEVER-256-24-gpt-4o-2024-05-13-989429 model.

      How to Use
    

    To use this dataset for model training or evaluation, you can load it using the Hugging Face datasets library as… See the full description on the dataset page: https://huggingface.co/datasets/fine-tuned/FEVER-256-24-gpt-4o-2024-05-13-989429.

  13. h

    Test-fine-tune

    • huggingface.co
    Updated Oct 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ning (2024). Test-fine-tune [Dataset]. https://huggingface.co/datasets/Allen1222/Test-fine-tune
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 4, 2024
    Authors
    Ning
    Description

    Allen1222/Test-fine-tune dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    saferdecoding-fine-tuning

    • huggingface.co
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anders Spear (2024). saferdecoding-fine-tuning [Dataset]. https://huggingface.co/datasets/aspear/saferdecoding-fine-tuning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2024
    Authors
    Anders Spear
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for SaferDecoding Fine Tuning Dataset

    This dataset aims to fine-tune models in an attempt to defend against jailbreak attacks. It is an extension of SafeDecoding

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    The dataset generation process was adapted from SafeDecoding. This dataset includes 252 original human-generated adversarial seed prompts, covering 18 harmful categories. This dataset includes responses generated by Llama2, Vicuna, Dolphin, Falcon… See the full description on the dataset page: https://huggingface.co/datasets/aspear/saferdecoding-fine-tuning.

  15. h

    toolverifier

    • huggingface.co
    Updated Mar 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI at Meta (2024). toolverifier [Dataset]. https://huggingface.co/datasets/facebook/toolverifier
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 14, 2024
    Dataset authored and provided by
    AI at Meta
    Description

    TOOLVERIFIER: Generalization to New Tools via Self-Verification

    This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.

      Data
    

    ToolSelect data is synthetic training data generated for tool selection task using Llama-2 70B and Llama-2-Chat-70B. It consists of 555 samples corresponding to 173 tools. Each training sample is composed of a user instruction, a candidate set of tools that includes the ground truth tool, and a… See the full description on the dataset page: https://huggingface.co/datasets/facebook/toolverifier.

  16. h

    jina-embeddings-v2-base-en-03052024-21on-webapp

    • huggingface.co
    Updated May 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fine-tuned Embeddings (2024). jina-embeddings-v2-base-en-03052024-21on-webapp [Dataset]. https://huggingface.co/datasets/fine-tuned/jina-embeddings-v2-base-en-03052024-21on-webapp
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 3, 2024
    Dataset authored and provided by
    Fine-tuned Embeddings
    Description

    license: apache-2.0 task_categories: - feature-extraction - sentence-similarity language: - en tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb - Events - Meetups - Networking - Community - Social pretty_name: event search for local meetups size_categories: - n<1K

      jina-embeddings-v2-base-en-03052024-21on-webapp Dataset
    
    
    
    
    
    
      Dataset Description
    

    The dataset is a generated dataset designed to support the development of domain… See the full description on the dataset page: https://huggingface.co/datasets/fine-tuned/jina-embeddings-v2-base-en-03052024-21on-webapp.

  17. h

    llama2-fine-tune-v2-dataset

    • huggingface.co
    Updated Feb 12, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cadman (2018). llama2-fine-tune-v2-dataset [Dataset]. https://huggingface.co/datasets/Anthony3456347095/llama2-fine-tune-v2-dataset
    Explore at:
    Dataset updated
    Feb 12, 2018
    Authors
    Cadman
    Description

    Anthony3456347095/llama2-fine-tune-v2-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. h

    E5-finetune-dataset

    • huggingface.co
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ProfessorBob (2024). E5-finetune-dataset [Dataset]. https://huggingface.co/datasets/ProfessorBob/E5-finetune-dataset
    Explore at:
    Dataset updated
    Feb 7, 2024
    Dataset authored and provided by
    ProfessorBob
    Description

    E5-finetune Dataset

    E5-finetune Dataset is a curated collection of query-passage pairs, encompassing a total of 870k examples. This dataset is specifically designed for fine-tuning models to extend their input length capabilities from 512 tokens to 1024 tokens. The primary focus is on accumulating long-context passages.

      Dataset in English
    

    The dataset samples long-context passage examples from various sources, ensuring a rich and diverse collection. The sources include:… See the full description on the dataset page: https://huggingface.co/datasets/ProfessorBob/E5-finetune-dataset.

  19. h

    fine-tune-nvidia-blackwell

    • huggingface.co
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gary Stafford (2025). fine-tune-nvidia-blackwell [Dataset]. https://huggingface.co/datasets/garystafford/fine-tune-nvidia-blackwell
    Explore at:
    Dataset updated
    May 29, 2025
    Authors
    Gary Stafford
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    garystafford/fine-tune-nvidia-blackwell dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    fine-tune-test

    • huggingface.co
    Updated Jun 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artem (2025). fine-tune-test [Dataset]. https://huggingface.co/datasets/withmuse/fine-tune-test
    Explore at:
    Dataset updated
    Jun 1, 2025
    Authors
    Artem
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    withmuse/fine-tune-test dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yifei (2023). llama2-sst2-fine-tuning [Dataset]. https://huggingface.co/datasets/OneFly7/llama2-sst2-fine-tuning

llama2-sst2-fine-tuning

OneFly7/llama2-sst2-fine-tuning

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 2, 2023
Authors
Yifei
Description

Dataset Card for "llama2-sst2-finetuning"

  Dataset Description

The Llama2-sst2-fine-tuning dataset is designed for supervised fine-tuning of the LLaMA V2 based on the GLUE SST2 for sentiment analysis classification task.We provide two subsets: training and validation.To ensure the effectiveness of fine-tuning, we convert the data into the prompt template for LLaMA V2 supervised fine-tuning, where the data will follow this format:
[INST] <

Search
Clear search
Close search
Google apps
Main menu