3 datasets found
  1. h

    chatbot_arena_conversations

    • huggingface.co
    Updated Jul 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Large Model Systems Organization (2023). chatbot_arena_conversations [Dataset]. https://huggingface.co/datasets/lmsys/chatbot_arena_conversations
    Explore at:
    Dataset updated
    Jul 18, 2023
    Dataset authored and provided by
    Large Model Systems Organization
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    Chatbot Arena Conversations Dataset

    This dataset contains 33K cleaned conversations with pairwise human preferences. It is collected from 13K unique IP addresses on the Chatbot Arena from April to June 2023. Each sample includes a question ID, two model names, their full conversation text in OpenAI API JSON format, the user vote, the anonymized user ID, the detected language tag, the OpenAI moderation API tag, the additional toxic tag, and the timestamp. To ensure the safe release… See the full description on the dataset page: https://huggingface.co/datasets/lmsys/chatbot_arena_conversations.

  2. h

    lmsys-chat-1m

    • huggingface.co
    Updated Jul 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jitendra Chauhan (2025). lmsys-chat-1m [Dataset]. https://huggingface.co/datasets/jc-detoxio/lmsys-chat-1m
    Explore at:
    Dataset updated
    Jul 2, 2025
    Authors
    Jitendra Chauhan
    Description

    LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

    This dataset contains one million real-world conversations with 25 state-of-the-art LLMs. It is collected from 210K unique IP addresses in the wild on the Vicuna demo and Chatbot Arena website from April to August 2023. Each sample includes a conversation ID, model name, conversation text in OpenAI API JSON format, detected language tag, and OpenAI moderation API tag. User consent is obtained through the "Terms of use"… See the full description on the dataset page: https://huggingface.co/datasets/jc-detoxio/lmsys-chat-1m.

  3. Arena-Hard-v0.1

    • kaggle.com
    Updated May 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMSYS ORG (2024). Arena-Hard-v0.1 [Dataset]. http://doi.org/10.34740/kaggle/dsv/8283907
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 1, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    LMSYS ORG
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Checkout our blog post

    Building an affordable and reliable benchmark for LLM chatbots has become a critical challenge. A high-quality benchmark should 1. robustly separate model capability 2. reflect human preference in real-world use cases 3. frequently update to avoid over-fitting or test set leakage

    Traditional benchmarks are often static or close-ended (e.g., MMLU multi-choice QA), which do not satisfy the above requirements. On the other hand, models are evolving faster than ever, underscoring the need to build benchmarks with high separability.

    We introduce Arena-Hard – a data pipeline to build high-quality benchmarks from live data in Chatbot Arena, which is a crowd-sourced platform for LLM evals.

    We compare our new benchmark, Arena Hard v0.1, to a current leading chat LLM benchmark, MT Bench. We show Arena Hard v0.1 offers significantly stronger separability against MT Bench with tighter confidence intervals. It also has a higher agreement (89.1%, see blog post) with the human preference ranking by Chatbot Arena (english-only). We expect to see this benchmark useful for model developers to differentiate their model checkpoints.

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Large Model Systems Organization (2023). chatbot_arena_conversations [Dataset]. https://huggingface.co/datasets/lmsys/chatbot_arena_conversations

chatbot_arena_conversations

lmsys/chatbot_arena_conversations

Explore at:
25 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jul 18, 2023
Dataset authored and provided by
Large Model Systems Organization
License

https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

Description

Chatbot Arena Conversations Dataset

This dataset contains 33K cleaned conversations with pairwise human preferences. It is collected from 13K unique IP addresses on the Chatbot Arena from April to June 2023. Each sample includes a question ID, two model names, their full conversation text in OpenAI API JSON format, the user vote, the anonymized user ID, the detected language tag, the OpenAI moderation API tag, the additional toxic tag, and the timestamp. To ensure the safe release… See the full description on the dataset page: https://huggingface.co/datasets/lmsys/chatbot_arena_conversations.

Search
Clear search
Close search
Google apps
Main menu