100+ datasets found

h
coding
huggingface.co
Updated Apr 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LiveBench (2025). coding [Dataset]. https://huggingface.co/datasets/livebench/coding
Explore at:
Dataset updated
Apr 2, 2025
Dataset authored and provided by
LiveBench
Description
Dataset Card for "livebench/coding"

LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored… See the full description on the dataset page: https://huggingface.co/datasets/livebench/coding.
h
test_generation
huggingface.co
Updated Aug 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Live Code Bench (2023). test_generation [Dataset]. https://huggingface.co/datasets/livecodebench/test_generation
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2023
Dataset authored and provided by
Live Code Bench
License
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Description
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

🏠 Home Page • 💻 GitHub Repository • 🏆 Leaderboard •

LiveCodeBench is a "live" updating benchmark for holistically evaluating code related capabilities of LLMs. Particularly, it evaluates LLMs across a range of capabilties including code generation, self-repair, test output prediction, and code execution. This is the code generation scenario of LiveCodeBench. It is also… See the full description on the dataset page: https://huggingface.co/datasets/livecodebench/test_generation.
h
livecodebench-execute
huggingface.co
Updated May 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alex Gu (2024). livecodebench-execute [Dataset]. https://huggingface.co/datasets/minimario/livecodebench-execute
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 7, 2024
Authors
Alex Gu
Description
minimario/livecodebench-execute dataset hosted on Hugging Face and contributed by the HF Datasets community
h
LiveCodeBench-v5
huggingface.co
Updated Jul 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prime Intellect (2025). LiveCodeBench-v5 [Dataset]. https://huggingface.co/datasets/PrimeIntellect/LiveCodeBench-v5
Explore at:
Dataset updated
Jul 14, 2025
Dataset authored and provided by
Prime Intellect
Description
PrimeIntellect/LiveCodeBench-v5 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
livecodebench-code-generation_all_only_input
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
xin1997, livecodebench-code-generation_all_only_input [Dataset]. https://huggingface.co/datasets/xin1997/livecodebench-code-generation_all_only_input
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
xin1997
Description
xin1997/livecodebench-code-generation_all_only_input dataset hosted on Hugging Face and contributed by the HF Datasets community
h
livecodebench
huggingface.co
Updated Mar 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Test Gen exp (2024). livecodebench [Dataset]. https://huggingface.co/datasets/test-gen/livecodebench
Explore at:
Dataset updated
Mar 2, 2024
Dataset authored and provided by
Test Gen exp
Description
test-gen/livecodebench dataset hosted on Hugging Face and contributed by the HF Datasets community
h
LiveCodeBench-Pro
huggingface.co
Updated Mar 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zihan Zheng (2025). LiveCodeBench-Pro [Dataset]. https://huggingface.co/datasets/QAQAQAQAQ/LiveCodeBench-Pro
Explore at:
Dataset updated
Mar 4, 2025
Authors
Zihan Zheng
Description
QAQAQAQAQ/LiveCodeBench-Pro dataset hosted on Hugging Face and contributed by the HF Datasets community
LiveCodeBench-CPP
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2025). LiveCodeBench-CPP [Dataset]. https://huggingface.co/datasets/nvidia/LiveCodeBench-CPP
Explore at:
Dataset updated
May 11, 2025
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LiveCodeBench-CPP: An Extension of LiveCodeBench for Contamination Free Evaluation in C++

Overview

LiveCodeBench-CPP includes 279 problems from the release_v5 of LiveCodeBench, covering the period from October 2024 to January 2025. These problems are sourced from AtCoder (175 problems) and LeetCode (104 problems).

AtCoder Problems: These require generated solutions to read inputs from standard input (stdin) and write outputs to standard output (stdout). For unit testing… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/LiveCodeBench-CPP.
h
LiveCodeBench
huggingface.co
Updated May 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GenX (2025). LiveCodeBench [Dataset]. https://huggingface.co/datasets/Gen-Verse/LiveCodeBench
Explore at:
Dataset updated
May 26, 2025
Dataset authored and provided by
GenX
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
We use Stdio input/output format here. For example, for the task to calculate the sum of a list, the input and output are in the following format: input = "5 1 2 3 4 5 " output = "15"

CodeContests and CodeForces are using this format, however, MBPP and part of LiveCodeBench are using functional input/output format, such like assert sum_function([1, 2, 3, 4, 5]) == 15

In this project, we have converted the the functional format to the Stdio format to achieve consistency. Paper | Code… See the full description on the dataset page: https://huggingface.co/datasets/Gen-Verse/LiveCodeBench.
LiveCodeBench-CodeGeneration
huggingface.co
Updated Mar 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Groq (2024). LiveCodeBench-CodeGeneration [Dataset]. https://huggingface.co/datasets/Groq/LiveCodeBench-CodeGeneration
Explore at:
Dataset updated
Mar 2, 2024
Dataset authored and provided by
Groqhttps://groq.com/
Description
Groq/LiveCodeBench-CodeGeneration dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Livecodebench-subset-50
huggingface.co
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ranajoy Sadhukhan (2025). Livecodebench-subset-50 [Dataset]. https://huggingface.co/datasets/Rano23/Livecodebench-subset-50
Explore at:
Dataset updated
Jun 21, 2025
Authors
Ranajoy Sadhukhan
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Rano23/Livecodebench-subset-50 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
code_generation_lite-th
huggingface.co
Updated Feb 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
iApp Technology (2025). code_generation_lite-th [Dataset]. https://huggingface.co/datasets/iapp/code_generation_lite-th
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 8, 2025
Dataset authored and provided by
iApp Technology
Description
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

Version 2.0 Original dataset: https://huggingface.co/datasets/livecodebench/code_generation_lite Translated to Thai by iApp Technology.
h
Synthia-S1-LiveCodeBench-Eval
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tesslate, Synthia-S1-LiveCodeBench-Eval [Dataset]. https://huggingface.co/datasets/Tesslate/Synthia-S1-LiveCodeBench-Eval
Explore at:
Dataset authored and provided by
Tesslate
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Synthia S1 27B LiveCodeBench Outputs

Done generating outputs. Evaluating now...
h
code_generation
huggingface.co
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tony Zhao (2025). code_generation [Dataset]. https://huggingface.co/datasets/ztony0712/code_generation
Explore at:
Dataset updated
May 21, 2025
Authors
Tony Zhao
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Visualization of Code Generation Task Cases Samples

Check dataset samples visualization by viewing Dataset Viewer. The sampling procedure is guided by the Elo distribution introduced in our method. Original dataset is release_v5 of livecodebench/code_generation_lite from hugging face. samples/origin: 879/880

License

This repository is licensed under the Apache License 2.0
h
DeepCoder-Preview-Dataset
huggingface.co
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agentica (2025). DeepCoder-Preview-Dataset [Dataset]. https://huggingface.co/datasets/agentica-org/DeepCoder-Preview-Dataset
Explore at:
Dataset updated
Apr 8, 2025
Dataset authored and provided by
Agentica
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Data

Our training dataset consists of 24K problems paired with their test cases:

7.5K TACO Verified problems. 16K verified coding problems from PrimeIntellect’s SYNTHETIC-1. 600 LiveCodeBench (v5) problems submitted between May 1, 2023 and July 31, 2024.

Our test dataset consists of:

LiveCodeBench (v5) problems between August 1, 2024 and February 1, 2025. Codeforces problems from Qwen/CodeElo.

Format

Each row in the dataset contains:

problem: The coding problem… See the full description on the dataset page: https://huggingface.co/datasets/agentica-org/DeepCoder-Preview-Dataset.
h
OpenThinker-7B_eval_03-11-25_18-35-31_0981
huggingface.co
Updated Nov 25, 2003
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ML Foundations Development (2003). OpenThinker-7B_eval_03-11-25_18-35-31_0981 [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/OpenThinker-7B_eval_03-11-25_18-35-31_0981
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 25, 2003
Dataset authored and provided by
ML Foundations Development
Description
mlfoundations-dev/OpenThinker-7B_eval_03-11-25_18-35-31_0981

Precomputed model outputs for evaluation.

Evaluation Results Summary

Metric LiveCodeBench AIME24 AIME25 AMC23 GPQADiamond MATH500

Accuracy 38.9 32.0 24.0 71.0 29.8 83.0

LiveCodeBench

Average Accuracy: 38.94% ± 0.69% Number of Runs: 3

Run Accuracy Questions Solved Total Questions

1 38.36% 196 511

2 38.16% 195 511

3 40.31% 206 511

AIME24… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations-dev/OpenThinker-7B_eval_03-11-25_18-35-31_0981.
h
hero_run_2_fix_conversations_eval_03-18-25_01-58-28_0981
huggingface.co
Updated Mar 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ML Foundations Development (2025). hero_run_2_fix_conversations_eval_03-18-25_01-58-28_0981 [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/hero_run_2_fix_conversations_eval_03-18-25_01-58-28_0981
Explore at:
Dataset updated
Mar 18, 2025
Dataset authored and provided by
ML Foundations Development
Description
mlfoundations-dev/hero_run_2_fix_conversations_eval_03-18-25_01-58-28_0981

Precomputed model outputs for evaluation.

Evaluation Results Summary

Metric LiveCodeBench AIME24 AIME25 AMC23 GPQADiamond MATH500

Accuracy 55.6 50.0 33.3 89.5 49.3 88.4

LiveCodeBench

Average Accuracy: 55.58% ± 0.79% Number of Runs: 3

Run Accuracy Questions Solved Total Questions

1 54.99% 281 511

2 57.14% 292 511

3 54.60% 279 511… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations-dev/hero_run_2_fix_conversations_eval_03-18-25_01-58-28_0981.
h
a1_science_camel_biology_1744691454_eval_1331
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ML Foundations Development, a1_science_camel_biology_1744691454_eval_1331 [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/a1_science_camel_biology_1744691454_eval_1331
Explore at:
Dataset authored and provided by
ML Foundations Development
Description
mlfoundations-dev/a1_science_camel_biology_1744691454_eval_1331

Precomputed model outputs for evaluation.

Evaluation Results Summary

Metric AIME24 AMC23 MATH500 GPQADiamond JEEBench MMLUPro LiveCodeBench CodeElo

Accuracy 11.7 51.2 70.2 27.8 31.3 27.6 0.1 2.4

AIME24

Average Accuracy: 11.67% ± 1.51% Number of Runs: 10

Run Accuracy Questions Solved Total Questions

1 16.67% 5 30

2 3.33% 1 30

3 13.33% 4 30

4 6.67% 2 30… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations-dev/a1_science_camel_biology_1744691454_eval_1331.
h
herorun_1_1_3epoch_eval_03-18-25_00-30-48_0981
huggingface.co
Updated Mar 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ML Foundations Development (2025). herorun_1_1_3epoch_eval_03-18-25_00-30-48_0981 [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/herorun_1_1_3epoch_eval_03-18-25_00-30-48_0981
Explore at:
Dataset updated
Mar 18, 2025
Dataset authored and provided by
ML Foundations Development
Description
mlfoundations-dev/herorun_1_1_3epoch_eval_03-18-25_00-30-48_0981

Precomputed model outputs for evaluation.

Evaluation Results Summary

Metric LiveCodeBench AIME24 AIME25 AMC23 GPQADiamond MATH500

Accuracy 43.1 46.0 28.0 80.5 44.4 87.0

LiveCodeBench

Average Accuracy: 43.12% ± 1.03% Number of Runs: 3

Run Accuracy Questions Solved Total Questions

1 45.01% 230 511

2 41.49% 212 511

3 42.86% 219 511

AIME24… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations-dev/herorun_1_1_3epoch_eval_03-18-25_00-30-48_0981.
h
Light-R1-32B_1743569788_eval_0981
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ML Foundations Development, Light-R1-32B_1743569788_eval_0981 [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/Light-R1-32B_1743569788_eval_0981
Explore at:
Dataset authored and provided by
ML Foundations Development
Description
mlfoundations-dev/Light-R1-32B_1743569788_eval_0981

Precomputed model outputs for evaluation.

Evaluation Results Summary

Metric AIME24 AIME25 AMC23 MATH500 GPQADiamond LiveCodeBench

Accuracy 75.3 55.3 95.5 90.2 22.6 55.7

AIME24

Average Accuracy: 75.33% ± 2.42% Number of Runs: 5

Run Accuracy Questions Solved Total Questions

1 83.33% 25 30

2 76.67% 23 30

3 73.33% 22 30

4 66.67% 20 30

5 76.67% 23 30… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations-dev/Light-R1-32B_1743569788_eval_0981.

Facebook

Twitter

Click to copy link

Link copied

Cite

LiveBench (2025). coding [Dataset]. https://huggingface.co/datasets/livebench/coding

coding

livebench/coding

Explore at:

Dataset updated

Apr 2, 2025

Dataset authored and provided by

LiveBench

Description

Dataset Card for "livebench/coding"

LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored… See the full description on the dataset page: https://huggingface.co/datasets/livebench/coding.

Clear search

Close search

Google apps

Main menu

coding

test_generation

livecodebench-execute

LiveCodeBench-v5

livecodebench-code-generation_all_only_input

livecodebench

LiveCodeBench-Pro

LiveCodeBench-CPP

LiveCodeBench

LiveCodeBench-CodeGeneration

Livecodebench-subset-50

code_generation_lite-th

Synthia-S1-LiveCodeBench-Eval

code_generation

DeepCoder-Preview-Dataset

OpenThinker-7B_eval_03-11-25_18-35-31_0981

hero_run_2_fix_conversations_eval_03-18-25_01-58-28_0981

a1_science_camel_biology_1744691454_eval_1331

herorun_1_1_3epoch_eval_03-18-25_00-30-48_0981

Light-R1-32B_1743569788_eval_0981

codingSee More Versions

livebench/coding

coding