100+ datasets found

h
llama2-sst2-fine-tuning
huggingface.co
Updated Aug 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yifei (2023). llama2-sst2-fine-tuning [Dataset]. https://huggingface.co/datasets/OneFly7/llama2-sst2-fine-tuning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 2, 2023
Authors
Yifei
Description
Dataset Card for "llama2-sst2-finetuning"

Dataset Description

The Llama2-sst2-fine-tuning dataset is designed for supervised fine-tuning of the LLaMA V2 based on the GLUE SST2 for sentiment analysis classification task.We provide two subsets: training and validation.To ensure the effectiveness of fine-tuning, we convert the data into the prompt template for LLaMA V2 supervised fine-tuning, where the data will follow this format:
[INST] <
h
Prepared-Dataset-Fine-Tune-Llama-2
huggingface.co
Updated Apr 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
moumad (2024). Prepared-Dataset-Fine-Tune-Llama-2 [Dataset]. https://huggingface.co/datasets/Hmoumad/Prepared-Dataset-Fine-Tune-Llama-2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2024
Authors
moumad
Description
Hmoumad/Prepared-Dataset-Fine-Tune-Llama-2 dataset hosted on Hugging Face and contributed by the HF Datasets community
r
HuggingFace models
redivis.com
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). HuggingFace models [Dataset]. https://redivis.com/workflows/gxw9-5ey3j79zs
Explore at:
Dataset updated
Feb 24, 2025
Description
Container dataset for demonstration of Hugging Face models on Redivis. Currently just contains a single BERT model, but may expand in the future.
h
BAAI_bge-small-en-v1_5-02082024-vrdv-webapp
huggingface.co
Updated Aug 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fine-tuned Embeddings (2024). BAAI_bge-small-en-v1_5-02082024-vrdv-webapp [Dataset]. https://huggingface.co/datasets/fine-tuned/BAAI_bge-small-en-v1_5-02082024-vrdv-webapp
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 2, 2024
Dataset authored and provided by
Fine-tuned Embeddings
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
BAAI_bge-small-en-v1_5-02082024-vrdv-webapp Dataset

Dataset Description

The dataset "general domain" is a generated dataset designed to support the development of domain specific embedding models for retrieval tasks.

Associated Model

This dataset was used to train the BAAI_bge-small-en-v1_5-02082024-vrdv-webapp model.

How to Use

To use this dataset for model training or evaluation, you can load it using the Hugging Face datasets library as follows:… See the full description on the dataset page: https://huggingface.co/datasets/fine-tuned/BAAI_bge-small-en-v1_5-02082024-vrdv-webapp.
r
Yelp reviews (Hugging Face)
redivis.com
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Yelp reviews (Hugging Face) [Dataset]. https://redivis.com/workflows/gxw9-5ey3j79zs
Explore at:
Dataset updated
Feb 24, 2025
Description
Dataset downloaded from Hugging Face, useful for training text classifier ML models.
h
Dermatology-Question-Answer-Dataset-For-Fine-Tuning
huggingface.co
Updated Nov 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Areeb Khan (2023). Dermatology-Question-Answer-Dataset-For-Fine-Tuning [Dataset]. https://huggingface.co/datasets/Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 17, 2023
Authors
Muhammad Areeb Khan
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Details

The data set has about 1 Million Tokens for Training and about 1500 question answers.

Dataset Description

This dataset is a comprehensive compilation of questions related to dermatology, spanning inquiries about various skin diseases, their symptoms, recommended medications, and available treatment modalities. Each question is paired with a concise and informative response, making it an ideal resource for training and fine-tuning language models in the… See the full description on the dataset page: https://huggingface.co/datasets/Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning.

chartllama-dataset

kaggle.com
huggingface.co

Updated Oct 12, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Alexey Shevchenko (2024). chartllama-dataset [Dataset]. https://www.kaggle.com/datasets/alexeyshevchenko/chartllama-dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 12, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Alexey Shevchenko

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

https://raw.githubusercontent.com/tingxueronghua/ChartLlama-code/refs/heads/main/static/teaser_visualization_final_v3.png" alt="teaser_visualization_final_v3">

A link to the original dataset located on HuggingFace: https://huggingface.co/datasets/listen2you002/ChartLlama-Dataset

This dataset can be used to fine-tune Visual Language Models (VVM) for the Visual question answering (VQA) task (answering the question about graphs and diagrams)

Table with examples of content

model	conversations	id
	[ { "from": "human", "value": "
What is the title of the chart?" }, { "from": "gpt", "value": "Analysis of smartphone usage patterns" } ]	ours_simplified_qa_37_0	ours/box_chart/png/box_chart_100examples_37.png
	[ { "from": "human", "value": "What are the outliers in the Microwave usage data?
" }, { "from": "gpt", "value": "None" } ]	ours_simplified_qa_56_2	ours/box_chart/png/box_chart_100examples_56.png
	[ { "from": "human", "value": "What's the food consumption of USA in Year 2?
" }, { "from": "gpt", "value": "200" } ]	ours_simplified_qa_69_0	ours/box_chart/png/box_chart_100examples_69.png

jojos-dataset
huggingface.co
Updated Mar 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Polytech Sorbonne X Hugging Face (2023). jojos-dataset [Dataset]. https://huggingface.co/datasets/polytechXhf/jojos-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 7, 2023
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Polytech Sorbonne X Hugging Face
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
More details will be added
e
Eesti RoBERTa mudel
data.europa.eu
unknown
Updated May 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eesti Keele Instituut (2024). Eesti RoBERTa mudel [Dataset]. https://data.europa.eu/data/datasets/oai-avaandmed-eesti-ee-daa73f99-3d3d-4056-9c79-0c2b721302ed?locale=bg
Explore at:
unknownAvailable download formats
Dataset updated
May 1, 2024
Dataset authored and provided by
Eesti Keele Instituut
Area covered
Estonia
Description
Est-RoBERTa is a monolingual Estonian RoBERTa-like language representation model. It was trained on Estonian corpora, containing mostly news articles, with 2.51 billion tokens in total.

The model can be used for various NLP classification tasks by fine tuning the model end-to-end or alternatively by extracting the word embedding vectors for each word occurrence and using the vectors as input. The model vocabulary consists of 40,000 (subword) tokens. Any word not present in the vocabulary gets split into subword tokens, eg. "identification" might get split as "▁identif ic ation". The tokens that form the beginning of a word (or the whole word) have a special character (▁) prepended (that is not underscore character). Other tokens that form a non-beginning part of a word do not have any characters prepended or appended.

The model configuration is in pytorch format, specifically for usage with transformers toolset by Huggingface (https://huggingface.co/transformers/), where it is also hosted already (https://huggingface.co/EMBEDDIA/est-roberta)
h
fine-tuning-dataset-mental-models-Llama3.1-8B
huggingface.co
Updated Sep 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel P (2024). fine-tuning-dataset-mental-models-Llama3.1-8B [Dataset]. https://huggingface.co/datasets/Incorrect12321/fine-tuning-dataset-mental-models-Llama3.1-8B
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 12, 2024
Authors
Daniel P
Description
Incorrect12321/fine-tuning-dataset-mental-models-Llama3.1-8B dataset hosted on Hugging Face and contributed by the HF Datasets community
P
SurgeGlobal/LaMini Dataset
paperswithcode.com
Updated Apr 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chandeepa Dissanayake; Lahiru Lowe; Sachith Gunasekara; Yasiru Ratnayake (2024). SurgeGlobal/LaMini Dataset [Dataset]. https://paperswithcode.com/dataset/surgeglobal-lamini
Explore at:
Dataset updated
Apr 17, 2024
Authors
Chandeepa Dissanayake; Lahiru Lowe; Sachith Gunasekara; Yasiru Ratnayake
Description
Overview The LaMini Dataset is an instruction dataset generated using h2ogpt-gm-oasst1-en-2048-falcon-40b-v2. It is designed for instruction-tuning pre-trained models to specialize them in a variety of downstream tasks.

Dataset Generation

Base Model: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2. Seed Instructions: Sourced from databricks/databricks-dolly-15k dataset. Generation Approach: Example-guided and topic-guided strategies. Total Instructions: 1,504 unique instruction examples.

Dataset Sources

Repository: Bitbucket Project Paper : Pre-Print

Structure Each entry in the dataset contains: - Instruction - Response

Usage The LaMini Dataset can be used to fine-tune language models to improve their ability to follow instructions and generate relevant responses.

Access The dataset is available on HuggingFace at the following link: https://huggingface.co/datasets/SurgeGlobal/LaMini

Citation If you find our work useful, please cite our paper as follows: @misc{surge2024openbezoar, title={OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data}, author={Chandeepa Dissanayake and Lahiru Lowe and Sachith Gunasekara and Yasiru Ratnayake}, year={2024}, eprint={2404.12195}, archivePrefix={arXiv}, primaryClass={cs.CL} }

Dataset Authors Chandeepa Dissanayake, Lahiru Lowe, Sachith Gunasekara, and Yasiru Ratnayake
h
FEVER-256-24-gpt-4o-2024-05-13-989429
huggingface.co
Updated May 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fine-tuned Embeddings (2024). FEVER-256-24-gpt-4o-2024-05-13-989429 [Dataset]. https://huggingface.co/datasets/fine-tuned/FEVER-256-24-gpt-4o-2024-05-13-989429
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 13, 2024
Dataset authored and provided by
Fine-tuned Embeddings
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
FEVER-256-24-gpt-4o-2024-05-13-989429 Dataset

Dataset Description

The dataset "dataset search for fact verification" is a generated dataset designed to support the development of domain specific embedding models for retrieval tasks.

Associated Model

This dataset was used to train the FEVER-256-24-gpt-4o-2024-05-13-989429 model.

How to Use

To use this dataset for model training or evaluation, you can load it using the Hugging Face datasets library as… See the full description on the dataset page: https://huggingface.co/datasets/fine-tuned/FEVER-256-24-gpt-4o-2024-05-13-989429.
h
Test-fine-tune
huggingface.co
Updated Oct 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ning (2024). Test-fine-tune [Dataset]. https://huggingface.co/datasets/Allen1222/Test-fine-tune
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 4, 2024
Authors
Ning
Description
Allen1222/Test-fine-tune dataset hosted on Hugging Face and contributed by the HF Datasets community
h
saferdecoding-fine-tuning
huggingface.co
Updated Dec 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anders Spear (2024). saferdecoding-fine-tuning [Dataset]. https://huggingface.co/datasets/aspear/saferdecoding-fine-tuning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 19, 2024
Authors
Anders Spear
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for SaferDecoding Fine Tuning Dataset

This dataset aims to fine-tune models in an attempt to defend against jailbreak attacks. It is an extension of SafeDecoding

Dataset Details Dataset Description

The dataset generation process was adapted from SafeDecoding. This dataset includes 252 original human-generated adversarial seed prompts, covering 18 harmful categories. This dataset includes responses generated by Llama2, Vicuna, Dolphin, Falcon… See the full description on the dataset page: https://huggingface.co/datasets/aspear/saferdecoding-fine-tuning.
h
toolverifier
huggingface.co
Updated Mar 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI at Meta (2024). toolverifier [Dataset]. https://huggingface.co/datasets/facebook/toolverifier
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 14, 2024
Dataset authored and provided by
AI at Meta
Description
TOOLVERIFIER: Generalization to New Tools via Self-Verification

This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.

Data

ToolSelect data is synthetic training data generated for tool selection task using Llama-2 70B and Llama-2-Chat-70B. It consists of 555 samples corresponding to 173 tools. Each training sample is composed of a user instruction, a candidate set of tools that includes the ground truth tool, and a… See the full description on the dataset page: https://huggingface.co/datasets/facebook/toolverifier.
h
jina-embeddings-v2-base-en-03052024-21on-webapp
huggingface.co
Updated May 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fine-tuned Embeddings (2024). jina-embeddings-v2-base-en-03052024-21on-webapp [Dataset]. https://huggingface.co/datasets/fine-tuned/jina-embeddings-v2-base-en-03052024-21on-webapp
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 3, 2024
Dataset authored and provided by
Fine-tuned Embeddings
Description
license: apache-2.0 task_categories: - feature-extraction - sentence-similarity language: - en tags: - sentence-transformers - feature-extraction - sentence-similarity - mteb - Events - Meetups - Networking - Community - Social pretty_name: event search for local meetups size_categories: - n<1K

jina-embeddings-v2-base-en-03052024-21on-webapp Dataset Dataset Description

The dataset is a generated dataset designed to support the development of domain… See the full description on the dataset page: https://huggingface.co/datasets/fine-tuned/jina-embeddings-v2-base-en-03052024-21on-webapp.
h
llama2-fine-tune-v2-dataset
huggingface.co
Updated Feb 12, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cadman (2018). llama2-fine-tune-v2-dataset [Dataset]. https://huggingface.co/datasets/Anthony3456347095/llama2-fine-tune-v2-dataset
Explore at:
Dataset updated
Feb 12, 2018
Authors
Cadman
Description
Anthony3456347095/llama2-fine-tune-v2-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
E5-finetune-dataset
huggingface.co
Updated Feb 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ProfessorBob (2024). E5-finetune-dataset [Dataset]. https://huggingface.co/datasets/ProfessorBob/E5-finetune-dataset
Explore at:
Dataset updated
Feb 7, 2024
Dataset authored and provided by
ProfessorBob
Description
E5-finetune Dataset

E5-finetune Dataset is a curated collection of query-passage pairs, encompassing a total of 870k examples. This dataset is specifically designed for fine-tuning models to extend their input length capabilities from 512 tokens to 1024 tokens. The primary focus is on accumulating long-context passages.

Dataset in English

The dataset samples long-context passage examples from various sources, ensuring a rich and diverse collection. The sources include:… See the full description on the dataset page: https://huggingface.co/datasets/ProfessorBob/E5-finetune-dataset.
h
fine-tune-nvidia-blackwell
huggingface.co
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gary Stafford (2025). fine-tune-nvidia-blackwell [Dataset]. https://huggingface.co/datasets/garystafford/fine-tune-nvidia-blackwell
Explore at:
Dataset updated
May 29, 2025
Authors
Gary Stafford
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
garystafford/fine-tune-nvidia-blackwell dataset hosted on Hugging Face and contributed by the HF Datasets community
h
fine-tune-test
huggingface.co
Updated Jun 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artem (2025). fine-tune-test [Dataset]. https://huggingface.co/datasets/withmuse/fine-tune-test
Explore at:
Dataset updated
Jun 1, 2025
Authors
Artem
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
withmuse/fine-tune-test dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Yifei (2023). llama2-sst2-fine-tuning [Dataset]. https://huggingface.co/datasets/OneFly7/llama2-sst2-fine-tuning

llama2-sst2-fine-tuning

OneFly7/llama2-sst2-fine-tuning

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 2, 2023

Authors

Yifei

Description

Dataset Card for "llama2-sst2-finetuning"

  Dataset Description

The Llama2-sst2-fine-tuning dataset is designed for supervised fine-tuning of the LLaMA V2 based on the GLUE SST2 for sentiment analysis classification task.We provide two subsets: training and validation.To ensure the effectiveness of fine-tuning, we convert the data into the prompt template for LLaMA V2 supervised fine-tuning, where the data will follow this format:
[INST] <

Clear search

Close search

Google apps

Main menu

llama2-sst2-fine-tuning

Prepared-Dataset-Fine-Tune-Llama-2

HuggingFace models

BAAI_bge-small-en-v1_5-02082024-vrdv-webapp

Yelp reviews (Hugging Face)

Dermatology-Question-Answer-Dataset-For-Fine-Tuning

chartllama-dataset

jojos-dataset

Eesti RoBERTa mudel

fine-tuning-dataset-mental-models-Llama3.1-8B

SurgeGlobal/LaMini Dataset

FEVER-256-24-gpt-4o-2024-05-13-989429

Test-fine-tune

saferdecoding-fine-tuning

toolverifier

jina-embeddings-v2-base-en-03052024-21on-webapp

llama2-fine-tune-v2-dataset

E5-finetune-dataset

fine-tune-nvidia-blackwell

fine-tune-test

llama2-sst2-fine-tuning

OneFly7/llama2-sst2-fine-tuning