4 datasets found
  1. h

    final_data

    • huggingface.co
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Najmul Islam Naeem (2025). final_data [Dataset]. https://huggingface.co/datasets/csenaeem/final_data
    Explore at:
    Dataset updated
    Jul 3, 2025
    Authors
    Najmul Islam Naeem
    Description

    Final Data - LLaMA Fine-Tuning Dataset

    This dataset is prepared for fine-tuning the meta-llama/Llama-2-7b-hf model using the TRL SFTTrainer.

      Structure
    

    train.json: Training examples in JSON format validation.json: Validation examples test.json: Optional test examples

      Format
    

    Each file contains a list of items with this format: { "text": "Your training sample here..." } from datasets import load_dataset

    dataset = load_dataset("csenaeem/final_data")

    … See the full description on the dataset page: https://huggingface.co/datasets/csenaeem/final_data.

  2. h

    oasst2_top1_chat_format_en

    • huggingface.co
    Updated Apr 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trelis (2024). oasst2_top1_chat_format_en [Dataset]. https://huggingface.co/datasets/Trelis/oasst2_top1_chat_format_en
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 19, 2024
    Dataset authored and provided by
    Trelis
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    OpenAssistant TOP-1 English Conversations

    This is a twice filtered dataset from oasst2, which is a set of conversation trees collected by the OpenAssistant project. It was first filtered for the top ranked branches in each conversation tree, to form blancsw/oasst2_top1_chat_format It was then filtered down to English-only, and to a single 'messages' data column. This allows the dataset to directly be input to the HuggingFace SFTTrainer (provided your tokenizer has a chat template)… See the full description on the dataset page: https://huggingface.co/datasets/Trelis/oasst2_top1_chat_format_en.

  3. h

    abstract_paper_review

    • huggingface.co
    Updated Apr 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yinuo Xie (2024). abstract_paper_review [Dataset]. https://huggingface.co/datasets/travis0103/abstract_paper_review
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 26, 2024
    Authors
    Yinuo Xie
    Description

    Dataset Description

      Abstract
    

    The Abstract Paper Reviews Dataset is designed for training machine learning models to generate reviews of academic papers based on the paper's title and abstract. It is formatted in a conversational style, facilitating direct use with models like the SFTTrainer without the need for additional parsing or conversion into a chat template. This dataset enables the development of models that can assist in peer review processes by providing… See the full description on the dataset page: https://huggingface.co/datasets/travis0103/abstract_paper_review.

  4. h

    reasoning-1-1k

    • huggingface.co
    Updated Dec 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fluently Datasets (2024). reasoning-1-1k [Dataset]. https://huggingface.co/datasets/fluently-sets/reasoning-1-1k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 22, 2024
    Dataset authored and provided by
    Fluently Datasets
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Reasoning-1 1K

      Short about
    

    This dataset will help in SFT training of LLM on the Alpaca format. The goal of the dataset: to teach LLM to reason and analyze its mistakes using SFT training. The size of 1.15K is quite small, so for effective training on SFTTrainer set 4-6 epochs instead of 1-3. Made by Fluently Team (@ehristoforu) using distilabel with love🥰

      Dataset structure
    

    This subset can be loaded as: from datasets import load_dataset

    ds =… See the full description on the dataset page: https://huggingface.co/datasets/fluently-sets/reasoning-1-1k.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Najmul Islam Naeem (2025). final_data [Dataset]. https://huggingface.co/datasets/csenaeem/final_data

final_data

csenaeem/final_data

Explore at:
Dataset updated
Jul 3, 2025
Authors
Najmul Islam Naeem
Description

Final Data - LLaMA Fine-Tuning Dataset

This dataset is prepared for fine-tuning the meta-llama/Llama-2-7b-hf model using the TRL SFTTrainer.

  Structure

train.json: Training examples in JSON format validation.json: Validation examples test.json: Optional test examples

  Format

Each file contains a list of items with this format: { "text": "Your training sample here..." } from datasets import load_dataset

dataset = load_dataset("csenaeem/final_data")

… See the full description on the dataset page: https://huggingface.co/datasets/csenaeem/final_data.

Search
Clear search
Close search
Google apps
Main menu