21 datasets found
  1. h

    chatbot_arena_conversations

    • huggingface.co
    Updated Jul 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Large Model Systems Organization (2023). chatbot_arena_conversations [Dataset]. https://huggingface.co/datasets/lmsys/chatbot_arena_conversations
    Explore at:
    Dataset updated
    Jul 18, 2023
    Dataset authored and provided by
    Large Model Systems Organization
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    Chatbot Arena Conversations Dataset

    This dataset contains 33K cleaned conversations with pairwise human preferences. It is collected from 13K unique IP addresses on the Chatbot Arena from April to June 2023. Each sample includes a question ID, two model names, their full conversation text in OpenAI API JSON format, the user vote, the anonymized user ID, the detected language tag, the OpenAI moderation API tag, the additional toxic tag, and the timestamp. To ensure the safe release… See the full description on the dataset page: https://huggingface.co/datasets/lmsys/chatbot_arena_conversations.

  2. h

    llm-jp-chatbot-arena-conversations

    • huggingface.co
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LLM-jp (2025). llm-jp-chatbot-arena-conversations [Dataset]. https://huggingface.co/datasets/llm-jp/llm-jp-chatbot-arena-conversations
    Explore at:
    Dataset updated
    Jul 15, 2025
    Dataset authored and provided by
    LLM-jp
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    LLM-jp Chatbot Arena Conversations Dataset

    This dataset contains approximately 1,000 conversations with pairwise human preferences, most of which are in Japanese. The data was collected during the trial phase of the LLM-jp Chatbot Arena (January–February 2025), where users compared responses from two different models in a head-to-head format. Each sample includes a question ID, the names of the two models, their conversation transcripts, the user's vote, an anonymized user ID, a… See the full description on the dataset page: https://huggingface.co/datasets/llm-jp/llm-jp-chatbot-arena-conversations.

  3. chatbot-arena-ja-calm2-7b-chat-experimental

    • huggingface.co
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CyberAgent (2024). chatbot-arena-ja-calm2-7b-chat-experimental [Dataset]. https://huggingface.co/datasets/cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental
    Explore at:
    Dataset updated
    Jan 24, 2024
    Dataset authored and provided by
    CyberAgenthttp://cyberagent.co.jp/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for "chatbot-arena-ja-calm2-7b-chat"

      Chatbot Arena Conversations JA (calm2) Dataset
    

    Chatbot Arena Conversations JA (calm2)はこちらの論文で構築されたRLHFのための日本語Instructionデータセットです。 「英語で公開されているデータセットをオープンソースのツール・モデルのみを使って日本語用に転用し、日本語LLMの学習に役立てることができるか」を検証する目的で作成しております。 指示文(prompt)はlmsys/chatbot_arena_conversationsのユーザ入力(CC-BY 4.0)を和訳したものです。これはChatbot Arenaを通して人間が作成した指示文であり、CC-BY 4.0で公開されているものです。複数ターンの対話の場合は最初のユーザ入力のみを使っています(そのため、このデータセットはすべて1ターンの対話のみになっております)。… See the full description on the dataset page: https://huggingface.co/datasets/cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental.

  4. h

    lmsys-arena-human-preference-winner-43k-unfiltered

    • huggingface.co
    Updated Sep 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lesserfield (2021). lmsys-arena-human-preference-winner-43k-unfiltered [Dataset]. https://huggingface.co/datasets/lesserfield/lmsys-arena-human-preference-winner-43k-unfiltered
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 15, 2021
    Authors
    Lesserfield
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    lmsys-arena-human-preference-winner-43k-unfiltered

    This repository contains a dataset derived from the lmsys/lmsys-arena-human-preference-55k dataset, which is licensed under the Apache 2.0 License.

      Dataset Description
    

    The lmsys-arena-human-preference-winner-43k-unfiltered dataset is a collection of 43,000 samples, each containing an instruction (prompt) and an output (winning response) from real-world user and LLM conversations. The dataset is derived from the original… See the full description on the dataset page: https://huggingface.co/datasets/lesserfield/lmsys-arena-human-preference-winner-43k-unfiltered.

  5. h

    lmsys-chat-1m

    • huggingface.co
    Updated Jul 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jitendra Chauhan (2025). lmsys-chat-1m [Dataset]. https://huggingface.co/datasets/jc-detoxio/lmsys-chat-1m
    Explore at:
    Dataset updated
    Jul 2, 2025
    Authors
    Jitendra Chauhan
    Description

    LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

    This dataset contains one million real-world conversations with 25 state-of-the-art LLMs. It is collected from 210K unique IP addresses in the wild on the Vicuna demo and Chatbot Arena website from April to August 2023. Each sample includes a conversation ID, model name, conversation text in OpenAI API JSON format, detected language tag, and OpenAI moderation API tag. User consent is obtained through the "Terms of use"… See the full description on the dataset page: https://huggingface.co/datasets/jc-detoxio/lmsys-chat-1m.

  6. Arena-Hard-v0.1

    • kaggle.com
    Updated May 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMSYS ORG (2024). Arena-Hard-v0.1 [Dataset]. http://doi.org/10.34740/kaggle/dsv/8283907
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 1, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    LMSYS ORG
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Checkout our blog post

    Building an affordable and reliable benchmark for LLM chatbots has become a critical challenge. A high-quality benchmark should 1. robustly separate model capability 2. reflect human preference in real-world use cases 3. frequently update to avoid over-fitting or test set leakage

    Traditional benchmarks are often static or close-ended (e.g., MMLU multi-choice QA), which do not satisfy the above requirements. On the other hand, models are evolving faster than ever, underscoring the need to build benchmarks with high separability.

    We introduce Arena-Hard – a data pipeline to build high-quality benchmarks from live data in Chatbot Arena, which is a crowd-sourced platform for LLM evals.

    We compare our new benchmark, Arena Hard v0.1, to a current leading chat LLM benchmark, MT Bench. We show Arena Hard v0.1 offers significantly stronger separability against MT Bench with tighter confidence intervals. It also has a higher agreement (89.1%, see blog post) with the human preference ranking by Chatbot Arena (english-only). We expect to see this benchmark useful for model developers to differentiate their model checkpoints.

  7. w

    Websites using Arena Liveblog And Chat Tool

    • webtechsurvey.com
    csv
    Updated May 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2024). Websites using Arena Liveblog And Chat Tool [Dataset]. https://webtechsurvey.com/technology/arena-liveblog-and-chat-tool
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 8, 2024
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the Arena Liveblog And Chat Tool technology, compiled through global website indexing conducted by WebTechSurvey.

  8. h

    VisionArena-Chat

    • huggingface.co
    Updated Jan 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMArena (2025). VisionArena-Chat [Dataset]. https://huggingface.co/datasets/lmarena-ai/VisionArena-Chat
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 17, 2025
    Dataset authored and provided by
    LMArena
    Description

    VisionArena-Battle: 30K Real-World Image Conversations with Pairwise Preference Votes

    200k single and multi-turn chats between users and VLM's collected on Chatbot Arena. WARNING: Images may contain inappropriate content.

      Dataset Details
    

    200K conversations 45 VLM's 138 languages ~43k unique images Question Category Tags (Captioning, OCR, Entity Recognition, Coding, Homework, Diagram, Humor, Creative Writing, Refusal)

      Dataset Description
    

    200,000… See the full description on the dataset page: https://huggingface.co/datasets/lmarena-ai/VisionArena-Chat.

  9. h

    arena-human-preference-55k

    • huggingface.co
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMArena (2025). arena-human-preference-55k [Dataset]. https://huggingface.co/datasets/lmarena-ai/arena-human-preference-55k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 29, 2025
    Dataset authored and provided by
    LMArena
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset for Kaggle competition on predicting human preference on Chatbot Arena battles. The training dataset includes over 55,000 real-world user and LLM conversations and user preferences across over 70 state-of-the-art LLMs, such as GPT-4, Claude 2, Llama 2, Gemini, and Mistral models. Each sample represents a battle consisting of 2 LLMs which answer the same question, with a user label of either prefer model A, prefer model B, tie, or tie (both bad).

      Citation
    

    Please cite the… See the full description on the dataset page: https://huggingface.co/datasets/lmarena-ai/arena-human-preference-55k.

  10. h

    search-arena-24k

    • huggingface.co
    Updated May 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMArena (2025). search-arena-24k [Dataset]. https://huggingface.co/datasets/lmarena-ai/search-arena-24k
    Explore at:
    Dataset updated
    May 9, 2025
    Dataset authored and provided by
    LMArena
    Description

    Overview

    This dataset contains ALL in-the-wild conversation crowdsourced from Search Arena between March 18, 2025 and May 8, 2025. It includes 24,069 multi-turn conversations with search-LLMs across diverse intents, languages, and topics—alongside 12,652 human preference votes. The dataset spans approximately 11,000 users across 136 countries, 13 publicly released models, around 90 languages (including 11% multilingual prompts), and over 5,000 multi-turn sessions. While user… See the full description on the dataset page: https://huggingface.co/datasets/lmarena-ai/search-arena-24k.

  11. h

    arena-preferences

    • huggingface.co
    Updated Apr 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maxime Labonne (2024). arena-preferences [Dataset]. https://huggingface.co/datasets/mlabonne/arena-preferences
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 22, 2024
    Authors
    Maxime Labonne
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    ⚔️ Arena Preferences

    This is a preference dataset based on lmsys/chatbot_arena_conversations. It contains multi-turn conversations (up to 11 turns) and original samples in 39 different languages (no translation).

    Chosen answers are answers where GPT-4 was the winner (33k => 2,868 samples) Duplicates were removed (13 samples) GPTisms were removed (166 samples)

      📊 Plots
    

    Here's breakdown of the four most represented languages + an "other" bin in the dataset.

    Here's… See the full description on the dataset page: https://huggingface.co/datasets/mlabonne/arena-preferences.

  12. h

    llm-jp-chatbot-arena-conversations-reformatted

    • huggingface.co
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kaeru39 (2025). llm-jp-chatbot-arena-conversations-reformatted [Dataset]. https://huggingface.co/datasets/ryota39/llm-jp-chatbot-arena-conversations-reformatted
    Explore at:
    Dataset updated
    May 29, 2025
    Authors
    kaeru39
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    llm-jp/llm-jp-chatbot-arena-conversationsを整形したデータセットです

  13. h

    search-arena-v1-7k

    • huggingface.co
    Updated Apr 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMArena (2025). search-arena-v1-7k [Dataset]. https://huggingface.co/datasets/lmarena-ai/search-arena-v1-7k
    Explore at:
    Dataset updated
    Apr 13, 2025
    Dataset authored and provided by
    LMArena
    Description

    Overview

    This dataset contains 7k leaderboard conversation votes collected from Search Arena between March 18, 2025 and April 13, 2025. All entries have been redacted for PII and sensitive user information to ensure privacy. Each data point includes:

    Two model responses (messages_a and messages_b) The human vote result A timestamp Full system metadata, LLM + web search trace, and post-processed metadata for controlled experiments (conv_meta)

    To reproduce the leaderboard results… See the full description on the dataset page: https://huggingface.co/datasets/lmarena-ai/search-arena-v1-7k.

  14. h

    chatbot-arena-ja-calm2-7b-chat-experimental-autotrain-dpo-formatted-10

    • huggingface.co
    Updated Sep 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asuka Namizuka (2024). chatbot-arena-ja-calm2-7b-chat-experimental-autotrain-dpo-formatted-10 [Dataset]. https://huggingface.co/datasets/a-namizuka/chatbot-arena-ja-calm2-7b-chat-experimental-autotrain-dpo-formatted-10
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 7, 2024
    Authors
    Asuka Namizuka
    Description

    a-namizuka/chatbot-arena-ja-calm2-7b-chat-experimental-autotrain-dpo-formatted-10 dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    preference-dissection

    • huggingface.co
    Updated Feb 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Preference-Dissection (2024). preference-dissection [Dataset]. https://huggingface.co/datasets/Preference-Dissection/preference-dissection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 18, 2024
    Authors
    Preference-Dissection
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Introduction

    We release the annotated data used in Dissecting Human and LLM Preferences. Original Dataset - The dataset is based on lmsys/chatbot_arena_conversations, which contains 33K cleaned conversations with pairwise human preferences collected from 13K unique IP addresses on the Chatbot Arena from April to June 2023. Filtering and Scenario-wise Sampling - We filter out the conversations that are not in English, with "Tie" or "Both Bad" labels, and the multi-turn… See the full description on the dataset page: https://huggingface.co/datasets/Preference-Dissection/preference-dissection.

  16. h

    Llama-3-70b-battles

    • huggingface.co
    Updated Oct 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMArena (2024). Llama-3-70b-battles [Dataset]. https://huggingface.co/datasets/lmarena-ai/Llama-3-70b-battles
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 30, 2024
    Dataset authored and provided by
    LMArena
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Chatbot Arena user conversations between Llama-3-70b VS GPT-4-1025 or Llama-3-70b VS Claude-3-Opus with user preference votes. Single turn. Excludes ties. Used in Llama Data Analysis blog post and "VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models" (Paper, Code).

      Citation
    

    @article{dunlap_vibecheck, title={VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models}, author={Lisa Dunlap and Krishna Mandal and Trevor… See the full description on the dataset page: https://huggingface.co/datasets/lmarena-ai/Llama-3-70b-battles.

  17. h

    chatbot-arena-ja-karakuri-lm-8x7b-chat-v0.1-awq

    • huggingface.co
    Updated May 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GENIAC Team Ozaki (2024). chatbot-arena-ja-karakuri-lm-8x7b-chat-v0.1-awq [Dataset]. https://huggingface.co/datasets/GENIAC-Team-Ozaki/chatbot-arena-ja-karakuri-lm-8x7b-chat-v0.1-awq
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 19, 2024
    Dataset authored and provided by
    GENIAC Team Ozaki
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    chatbot-arena-ja-calm2-7b-chatをフィルタリングし、karakuri-lm-8x7b-chat-v0.1-awqでchosenを生成しました

  18. h

    wildvision-chat

    • huggingface.co
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WildVision Team (2024). wildvision-chat [Dataset]. https://huggingface.co/datasets/WildVision/wildvision-chat
    Explore at:
    Dataset updated
    Aug 27, 2024
    Dataset authored and provided by
    WildVision Team
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    WildVision-Chat

    WildVisvion-Chat is the publicly released chat data collected from WildVision-Arena. We hope our released data can promote the development of a multimodal language model.

      Models
    

    the WildVision datasets contain user conversations with PaliGemma, GPT-4T, GPT-4o, Phi 3 vision, Gemini 1.5, Neva 22b, Claude 3 Haiku, Idefics2-8b, Qwen-VL plus, Claude 3.5 Sonnet, Qwen-VL max, Yi-VL plus, MiniCPM LLama3, Claude 3 Sonnet, Claude 3 Opus, and GPT-4 Vision preview.… See the full description on the dataset page: https://huggingface.co/datasets/WildVision/wildvision-chat.

  19. h

    The_Attribution_Crisis_in_LLM_Search_Results

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The AI Disclosures Project at the SSRC, The_Attribution_Crisis_in_LLM_Search_Results [Dataset]. https://huggingface.co/datasets/Disclosures-SSRC/The_Attribution_Crisis_in_LLM_Search_Results
    Explore at:
    Dataset authored and provided by
    The AI Disclosures Project at the SSRC
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    AI_Ecosystem_Exploitation

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    The dataset used in the "The Attribution Crisis in LLM Search Results" paper.

      Links
    

    Repository: Ecosystem_Exploitation_In_Search_Results Paper: Link

      Uses
    

    This dataset contains 7000 conversations sourced from the lmsys search arena, it has been post processed to distinguish between search results and in-text citations. This dataset is usefull to observe the patterns of LLM… See the full description on the dataset page: https://huggingface.co/datasets/Disclosures-SSRC/The_Attribution_Crisis_in_LLM_Search_Results.

  20. h

    wildvision-battle

    • huggingface.co
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WildVision Team (2024). wildvision-battle [Dataset]. https://huggingface.co/datasets/WildVision/wildvision-battle
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset authored and provided by
    WildVision Team
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    WildVision-Battle

    WildVisvion-Battle is the publicly released battle data collected from WildVision-Arena, where user's preferences are collected across multiple models and wild tasks from real-world users.

      Models
    

    the WildVision datasets contain user conversations with PaliGemma, GPT-4T, GPT-4o, Phi 3 vision, Gemini 1.5, Neva 22b, Claude 3 Haiku, Idefics2-8b, Qwen-VL plus, Claude 3.5 Sonnet, Qwen-VL max, Yi-VL plus, MiniCPM LLama3, Claude 3 Sonnet, Claude 3 Opus, and… See the full description on the dataset page: https://huggingface.co/datasets/WildVision/wildvision-battle.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Large Model Systems Organization (2023). chatbot_arena_conversations [Dataset]. https://huggingface.co/datasets/lmsys/chatbot_arena_conversations

chatbot_arena_conversations

lmsys/chatbot_arena_conversations

Explore at:
25 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jul 18, 2023
Dataset authored and provided by
Large Model Systems Organization
License

https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

Description

Chatbot Arena Conversations Dataset

This dataset contains 33K cleaned conversations with pairwise human preferences. It is collected from 13K unique IP addresses on the Chatbot Arena from April to June 2023. Each sample includes a question ID, two model names, their full conversation text in OpenAI API JSON format, the user vote, the anonymized user ID, the detected language tag, the OpenAI moderation API tag, the additional toxic tag, and the timestamp. To ensure the safe release… See the full description on the dataset page: https://huggingface.co/datasets/lmsys/chatbot_arena_conversations.

Search
Clear search
Close search
Google apps
Main menu