8 datasets found

h
data_analysis
huggingface.co
Updated Jun 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LiveBench (2024). data_analysis [Dataset]. https://huggingface.co/datasets/livebench/data_analysis
Explore at:
Dataset updated
Jun 24, 2024
Dataset authored and provided by
LiveBench
Description
Dataset Card for "livebench/data_analysis"

LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be… See the full description on the dataset page: https://huggingface.co/datasets/livebench/data_analysis.
h
model_answer
huggingface.co
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LiveBench (2025). model_answer [Dataset]. https://huggingface.co/datasets/livebench/model_answer
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 31, 2025
Dataset authored and provided by
LiveBench
Description
Dataset Card for "livebench/model_answer"

LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be… See the full description on the dataset page: https://huggingface.co/datasets/livebench/model_answer.
h
instruction_following
huggingface.co
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LiveBench (2024). instruction_following [Dataset]. https://huggingface.co/datasets/livebench/instruction_following
Explore at:
Dataset updated
Nov 25, 2024
Dataset authored and provided by
LiveBench
Description
Dataset Card for "livebench/instruction_following"

LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions… See the full description on the dataset page: https://huggingface.co/datasets/livebench/instruction_following.
h
reasoning
huggingface.co
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LiveBench (2025). reasoning [Dataset]. https://huggingface.co/datasets/livebench/reasoning
Explore at:
Dataset updated
Mar 31, 2025
Dataset authored and provided by
LiveBench
Description
Dataset Card for "livebench/reasoning"

LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored… See the full description on the dataset page: https://huggingface.co/datasets/livebench/reasoning.
h
coding
huggingface.co
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LiveBench (2025). coding [Dataset]. https://huggingface.co/datasets/livebench/coding
Explore at:
Dataset updated
Apr 2, 2025
Dataset authored and provided by
LiveBench
Description
Dataset Card for "livebench/coding"

LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored… See the full description on the dataset page: https://huggingface.co/datasets/livebench/coding.
h
model_judgment
huggingface.co
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LiveBench (2025). model_judgment [Dataset]. https://huggingface.co/datasets/livebench/model_judgment
Explore at:
Dataset updated
Mar 31, 2025
Dataset authored and provided by
LiveBench
Description
Dataset Card for "livebench/model_judgment"

LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be… See the full description on the dataset page: https://huggingface.co/datasets/livebench/model_judgment.
h
math
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LiveBench, math [Dataset]. https://huggingface.co/datasets/livebench/math
Explore at:
Dataset authored and provided by
LiveBench
Description
Dataset Card for "livebench/math"

LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored… See the full description on the dataset page: https://huggingface.co/datasets/livebench/math.
h
language
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LiveBench, language [Dataset]. https://huggingface.co/datasets/livebench/language
Explore at:
Dataset authored and provided by
LiveBench
Description
Dataset Card for "livebench/language"

LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored… See the full description on the dataset page: https://huggingface.co/datasets/livebench/language.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

LiveBench (2024). data_analysis [Dataset]. https://huggingface.co/datasets/livebench/data_analysis

data_analysis

livebench/data_analysis

Explore at:

Dataset updated

Jun 24, 2024

Dataset authored and provided by

LiveBench

Description

Dataset Card for "livebench/data_analysis"

LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be… See the full description on the dataset page: https://huggingface.co/datasets/livebench/data_analysis.

Clear search

Close search

Google apps

Main menu

data_analysis

model_answer

instruction_following

reasoning

coding

model_judgment

math

language

data_analysis

livebench/data_analysis