4 datasets found

h
final_data
huggingface.co
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Najmul Islam Naeem (2025). final_data [Dataset]. https://huggingface.co/datasets/csenaeem/final_data
Explore at:
Dataset updated
Jul 3, 2025
Authors
Najmul Islam Naeem
Description
Final Data - LLaMA Fine-Tuning Dataset

This dataset is prepared for fine-tuning the meta-llama/Llama-2-7b-hf model using the TRL SFTTrainer.

Structure

train.json: Training examples in JSON format validation.json: Validation examples test.json: Optional test examples

Format

Each file contains a list of items with this format: { "text": "Your training sample here..." } from datasets import load_dataset

dataset = load_dataset("csenaeem/final_data")

… See the full description on the dataset page: https://huggingface.co/datasets/csenaeem/final_data.
h
oasst2_top1_chat_format_en
huggingface.co
Updated Apr 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Trelis (2024). oasst2_top1_chat_format_en [Dataset]. https://huggingface.co/datasets/Trelis/oasst2_top1_chat_format_en
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 19, 2024
Dataset authored and provided by
Trelis
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
OpenAssistant TOP-1 English Conversations

This is a twice filtered dataset from oasst2, which is a set of conversation trees collected by the OpenAssistant project. It was first filtered for the top ranked branches in each conversation tree, to form blancsw/oasst2_top1_chat_format It was then filtered down to English-only, and to a single 'messages' data column. This allows the dataset to directly be input to the HuggingFace SFTTrainer (provided your tokenizer has a chat template)… See the full description on the dataset page: https://huggingface.co/datasets/Trelis/oasst2_top1_chat_format_en.
h
abstract_paper_review
huggingface.co
Updated Apr 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yinuo Xie (2024). abstract_paper_review [Dataset]. https://huggingface.co/datasets/travis0103/abstract_paper_review
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 26, 2024
Authors
Yinuo Xie
Description
Dataset Description

Abstract

The Abstract Paper Reviews Dataset is designed for training machine learning models to generate reviews of academic papers based on the paper's title and abstract. It is formatted in a conversational style, facilitating direct use with models like the SFTTrainer without the need for additional parsing or conversion into a chat template. This dataset enables the development of models that can assist in peer review processes by providing… See the full description on the dataset page: https://huggingface.co/datasets/travis0103/abstract_paper_review.
h
reasoning-1-1k
huggingface.co
Updated Dec 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fluently Datasets (2024). reasoning-1-1k [Dataset]. https://huggingface.co/datasets/fluently-sets/reasoning-1-1k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 22, 2024
Dataset authored and provided by
Fluently Datasets
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Reasoning-1 1K

Short about

This dataset will help in SFT training of LLM on the Alpaca format. The goal of the dataset: to teach LLM to reason and analyze its mistakes using SFT training. The size of 1.15K is quite small, so for effective training on SFTTrainer set 4-6 epochs instead of 1-3. Made by Fluently Team (@ehristoforu) using distilabel with love🥰

Dataset structure

This subset can be loaded as: from datasets import load_dataset

ds =… See the full description on the dataset page: https://huggingface.co/datasets/fluently-sets/reasoning-1-1k.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Najmul Islam Naeem (2025). final_data [Dataset]. https://huggingface.co/datasets/csenaeem/final_data

final_data

csenaeem/final_data

Explore at:

Dataset updated

Jul 3, 2025

Authors

Najmul Islam Naeem

Description

Final Data - LLaMA Fine-Tuning Dataset

This dataset is prepared for fine-tuning the meta-llama/Llama-2-7b-hf model using the TRL SFTTrainer.

  Structure

train.json: Training examples in JSON format validation.json: Validation examples test.json: Optional test examples

  Format

Each file contains a list of items with this format: { "text": "Your training sample here..." } from datasets import load_dataset

dataset = load_dataset("csenaeem/final_data")

… See the full description on the dataset page: https://huggingface.co/datasets/csenaeem/final_data.

Clear search

Close search

Google apps

Main menu

final_data

… See the full description on the dataset page: https://huggingface.co/datasets/csenaeem/final_data.

oasst2_top1_chat_format_en

abstract_paper_review

reasoning-1-1k

final_data

csenaeem/final_data

… See the full description on the dataset page: https://huggingface.co/datasets/csenaeem/final_data.