8 datasets found
  1. h

    data_analysis

    • huggingface.co
    Updated Jun 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LiveBench (2024). data_analysis [Dataset]. https://huggingface.co/datasets/livebench/data_analysis
    Explore at:
    Dataset updated
    Jun 24, 2024
    Dataset authored and provided by
    LiveBench
    Description

    Dataset Card for "livebench/data_analysis"

    LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

    LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be… See the full description on the dataset page: https://huggingface.co/datasets/livebench/data_analysis.

  2. h

    model_answer

    • huggingface.co
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LiveBench (2025). model_answer [Dataset]. https://huggingface.co/datasets/livebench/model_answer
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2025
    Dataset authored and provided by
    LiveBench
    Description

    Dataset Card for "livebench/model_answer"

    LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

    LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be… See the full description on the dataset page: https://huggingface.co/datasets/livebench/model_answer.

  3. h

    instruction_following

    • huggingface.co
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LiveBench (2024). instruction_following [Dataset]. https://huggingface.co/datasets/livebench/instruction_following
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset authored and provided by
    LiveBench
    Description

    Dataset Card for "livebench/instruction_following"

    LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

    LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions… See the full description on the dataset page: https://huggingface.co/datasets/livebench/instruction_following.

  4. h

    reasoning

    • huggingface.co
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LiveBench (2025). reasoning [Dataset]. https://huggingface.co/datasets/livebench/reasoning
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset authored and provided by
    LiveBench
    Description

    Dataset Card for "livebench/reasoning"

    LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

    LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored… See the full description on the dataset page: https://huggingface.co/datasets/livebench/reasoning.

  5. h

    coding

    • huggingface.co
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LiveBench (2025). coding [Dataset]. https://huggingface.co/datasets/livebench/coding
    Explore at:
    Dataset updated
    Apr 2, 2025
    Dataset authored and provided by
    LiveBench
    Description

    Dataset Card for "livebench/coding"

    LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

    LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored… See the full description on the dataset page: https://huggingface.co/datasets/livebench/coding.

  6. h

    model_judgment

    • huggingface.co
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LiveBench (2025). model_judgment [Dataset]. https://huggingface.co/datasets/livebench/model_judgment
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset authored and provided by
    LiveBench
    Description

    Dataset Card for "livebench/model_judgment"

    LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

    LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be… See the full description on the dataset page: https://huggingface.co/datasets/livebench/model_judgment.

  7. h

    math

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LiveBench, math [Dataset]. https://huggingface.co/datasets/livebench/math
    Explore at:
    Dataset authored and provided by
    LiveBench
    Description

    Dataset Card for "livebench/math"

    LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

    LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored… See the full description on the dataset page: https://huggingface.co/datasets/livebench/math.

  8. h

    language

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LiveBench, language [Dataset]. https://huggingface.co/datasets/livebench/language
    Explore at:
    Dataset authored and provided by
    LiveBench
    Description

    Dataset Card for "livebench/language"

    LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

    LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored… See the full description on the dataset page: https://huggingface.co/datasets/livebench/language.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
LiveBench (2024). data_analysis [Dataset]. https://huggingface.co/datasets/livebench/data_analysis

data_analysis

livebench/data_analysis

Explore at:
Dataset updated
Jun 24, 2024
Dataset authored and provided by
LiveBench
Description

Dataset Card for "livebench/data_analysis"

LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be… See the full description on the dataset page: https://huggingface.co/datasets/livebench/data_analysis.

Search
Clear search
Close search
Google apps
Main menu