11 datasets found
  1. h

    lmsys-chat-1m

    • huggingface.co
    • opendatalab.com
    Updated Sep 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Large Model Systems Organization (2023). lmsys-chat-1m [Dataset]. https://huggingface.co/datasets/lmsys/lmsys-chat-1m
    Explore at:
    Dataset updated
    Sep 17, 2023
    Dataset authored and provided by
    Large Model Systems Organization
    Description

    LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

    This dataset contains one million real-world conversations with 25 state-of-the-art LLMs. It is collected from 210K unique IP addresses in the wild on the Vicuna demo and Chatbot Arena website from April to August 2023. Each sample includes a conversation ID, model name, conversation text in OpenAI API JSON format, detected language tag, and OpenAI moderation API tag. User consent is obtained through the "Terms of… See the full description on the dataset page: https://huggingface.co/datasets/lmsys/lmsys-chat-1m.

  2. h

    lmsys-chat-1m-qwen2-instruct

    • huggingface.co
    Updated Nov 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brian Williams (2024). lmsys-chat-1m-qwen2-instruct [Dataset]. https://huggingface.co/datasets/bew/lmsys-chat-1m-qwen2-instruct
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 22, 2024
    Authors
    Brian Williams
    Description

    bew/lmsys-chat-1m-qwen2-instruct dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    lmsys-chat-1m-jsonify-v2

    • huggingface.co
    Updated May 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lmsys-chat-1m-jsonify-v2 [Dataset]. https://huggingface.co/datasets/jsonifize/lmsys-chat-1m-jsonify-v2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 27, 2024
    Dataset authored and provided by
    jsonifize
    Description

    jsonifize/lmsys-chat-1m-jsonify-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    lmsys-chat-Qwen2.5-1.5B-Instruct-1epoch-100k

    • huggingface.co
    Updated Nov 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanley Tang (2024). lmsys-chat-Qwen2.5-1.5B-Instruct-1epoch-100k [Dataset]. https://huggingface.co/datasets/Stanleytowne/lmsys-chat-Qwen2.5-1.5B-Instruct-1epoch-100k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 26, 2024
    Authors
    Stanley Tang
    Description

    Stanleytowne/lmsys-chat-Qwen2.5-1.5B-Instruct-1epoch-100k dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. mt_bench_prompts

    • huggingface.co
    • hf-proxy-cf.effarig.site
    Updated Jul 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mt_bench_prompts [Dataset]. https://huggingface.co/datasets/HuggingFaceH4/mt_bench_prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 3, 2023
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face H4
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    MT Bench by LMSYS

    This set of evaluation prompts is created by the LMSYS org for better evaluation of chat models. For more information, see the paper.

      Dataset loading
    

    To load this dataset, use πŸ€— datasets: from datasets import load_dataset data = load_dataset(HuggingFaceH4/mt_bench_prompts, split="train")

      Dataset creation
    

    To create the dataset, we do the following for our internal tooling.

    rename turns to prompts, add empty reference to… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceH4/mt_bench_prompts.

  6. h

    ScaleBiO-Train-lmsys-chat-1m

    • huggingface.co
    Updated Nov 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ScaleBiO (2024). ScaleBiO-Train-lmsys-chat-1m [Dataset]. https://huggingface.co/datasets/ScaleBiO/ScaleBiO-Train-lmsys-chat-1m
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 24, 2024
    Dataset authored and provided by
    ScaleBiO
    Description

    Dataset Card for "ScaleBiO-Train-lmsys-chat-1m"

    More Information needed

  7. h

    lmsys-finance

    • huggingface.co
    Updated Apr 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GUIJIN SON (2024). lmsys-finance [Dataset]. https://huggingface.co/datasets/amphora/lmsys-finance
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 3, 2024
    Authors
    GUIJIN SON
    Description

    Dataset Card for "lmsys-finance"

    This dataset is a curated version of the lmsys-chat-1m dataset, focusing solely on finance-related conversations. The refinement process encompassed:

    Removing non-English conversations. Selecting conversations from models: "vicuna-33b", "wizardlm-13b", "gpt-4", "gpt-3.5-turbo", "claude-2", "palm-2", and "claude-instant-1". Excluding conversations with responses under 30 characters. Using 100 financial keywords, choosing conversations with at… See the full description on the dataset page: https://huggingface.co/datasets/amphora/lmsys-finance.

  8. h

    Barcenas-lmsys-Dataset

    • huggingface.co
    Updated Oct 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel (2023). Barcenas-lmsys-Dataset [Dataset]. https://huggingface.co/datasets/Danielbrdz/Barcenas-lmsys-Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 20, 2023
    Authors
    Daniel
    Description

    Dataset made on the basis of lmsys/lmsys-chat-1m With data only for the Spanish language.

  9. h

    lmsys-chat-tiny-20k

    • huggingface.co
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    πŸŽ€θΆ…η΅Άζœ€γ‹γ‚πŸŽ€γ¦γ‚“γ—γ‘γ‚ƒγ‚“ (2024). lmsys-chat-tiny-20k [Dataset]. https://huggingface.co/datasets/x-angelkawaii-x/lmsys-chat-tiny-20k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 10, 2024
    Dataset authored and provided by
    πŸŽ€θΆ…η΅Άζœ€γ‹γ‚πŸŽ€γ¦γ‚“γ—γ‘γ‚ƒγ‚“
    Description

    x-angelkawaii-x/lmsys-chat-tiny-20k dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    wild-if-eval

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gili Lior, wild-if-eval [Dataset]. https://huggingface.co/datasets/gililior/wild-if-eval
    Explore at:
    Authors
    Gili Lior
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    WildIFEval Dataset

    This dataset was originally introduced in the paper WildIFEval: Instruction Following in the Wild, available on arXiv. Code: https://github.com/gililior/wild-if-eval

      Dataset Overview
    

    The WildIFEval dataset is designed for evaluating instruction-following capabilities in language models. It provides decompositions of conversations extracted from the LMSYS-Chat-1M dataset. Each example includes:

    conversation_id: A unique identifier for each conversation.… See the full description on the dataset page: https://huggingface.co/datasets/gililior/wild-if-eval.

  11. h

    diffing-stats-gemma-2-2b-crosscoder-l13-mu4.1e-02-lr1e-04

    • huggingface.co
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Science of Finetuning (Neel Nanda's MATS 7.0) (2024). diffing-stats-gemma-2-2b-crosscoder-l13-mu4.1e-02-lr1e-04 [Dataset]. https://huggingface.co/datasets/science-of-finetuning/diffing-stats-gemma-2-2b-crosscoder-l13-mu4.1e-02-lr1e-04
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2024
    Dataset authored and provided by
    Science of Finetuning (Neel Nanda's MATS 7.0)
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Contains maximum activating examples for all the features of our crosscoder trained on gemma 2 2B layer 13 available here: https://huggingface.co/Butanium/gemma-2-2b-crosscoder-l13-mu4.1e-02-lr1e-04/blob/main/README.md

    base_examples.pt contains all the maximum examples of the feature on a subset of validation test of fineweb chat_examples.pt is the same but for lmsys chat data chat_base_examples.pt is a merge of the two above files. All files are of the type dict[int, list[tuple[float… See the full description on the dataset page: https://huggingface.co/datasets/science-of-finetuning/diffing-stats-gemma-2-2b-crosscoder-l13-mu4.1e-02-lr1e-04.

  12. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Large Model Systems Organization (2023). lmsys-chat-1m [Dataset]. https://huggingface.co/datasets/lmsys/lmsys-chat-1m

lmsys-chat-1m

lmsys/lmsys-chat-1m

Explore at:
190 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Sep 17, 2023
Dataset authored and provided by
Large Model Systems Organization
Description

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

This dataset contains one million real-world conversations with 25 state-of-the-art LLMs. It is collected from 210K unique IP addresses in the wild on the Vicuna demo and Chatbot Arena website from April to August 2023. Each sample includes a conversation ID, model name, conversation text in OpenAI API JSON format, detected language tag, and OpenAI moderation API tag. User consent is obtained through the "Terms of… See the full description on the dataset page: https://huggingface.co/datasets/lmsys/lmsys-chat-1m.

Search
Clear search
Close search
Google apps
Main menu