100+ datasets found
  1. h

    coding

    • huggingface.co
    Updated Apr 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LiveBench (2025). coding [Dataset]. https://huggingface.co/datasets/livebench/coding
    Explore at:
    Dataset updated
    Apr 2, 2025
    Dataset authored and provided by
    LiveBench
    Description

    Dataset Card for "livebench/coding"

    LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

    LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored… See the full description on the dataset page: https://huggingface.co/datasets/livebench/coding.

  2. h

    test_generation

    • huggingface.co
    Updated Aug 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Live Code Bench (2023). test_generation [Dataset]. https://huggingface.co/datasets/livecodebench/test_generation
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2023
    Dataset authored and provided by
    Live Code Bench
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

    🏠 Home Page •
    💻 GitHub Repository •
    🏆 Leaderboard •
    

    LiveCodeBench is a "live" updating benchmark for holistically evaluating code related capabilities of LLMs. Particularly, it evaluates LLMs across a range of capabilties including code generation, self-repair, test output prediction, and code execution. This is the code generation scenario of LiveCodeBench. It is also… See the full description on the dataset page: https://huggingface.co/datasets/livecodebench/test_generation.

  3. h

    livecodebench-execute

    • huggingface.co
    Updated May 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Gu (2024). livecodebench-execute [Dataset]. https://huggingface.co/datasets/minimario/livecodebench-execute
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2024
    Authors
    Alex Gu
    Description

    minimario/livecodebench-execute dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    LiveCodeBench-v5

    • huggingface.co
    Updated Jul 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prime Intellect (2025). LiveCodeBench-v5 [Dataset]. https://huggingface.co/datasets/PrimeIntellect/LiveCodeBench-v5
    Explore at:
    Dataset updated
    Jul 14, 2025
    Dataset authored and provided by
    Prime Intellect
    Description

    PrimeIntellect/LiveCodeBench-v5 dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    livecodebench-code-generation_all_only_input

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    xin1997, livecodebench-code-generation_all_only_input [Dataset]. https://huggingface.co/datasets/xin1997/livecodebench-code-generation_all_only_input
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    xin1997
    Description

    xin1997/livecodebench-code-generation_all_only_input dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    livecodebench

    • huggingface.co
    Updated Mar 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Test Gen exp (2024). livecodebench [Dataset]. https://huggingface.co/datasets/test-gen/livecodebench
    Explore at:
    Dataset updated
    Mar 2, 2024
    Dataset authored and provided by
    Test Gen exp
    Description

    test-gen/livecodebench dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    LiveCodeBench-Pro

    • huggingface.co
    Updated Mar 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zihan Zheng (2025). LiveCodeBench-Pro [Dataset]. https://huggingface.co/datasets/QAQAQAQAQ/LiveCodeBench-Pro
    Explore at:
    Dataset updated
    Mar 4, 2025
    Authors
    Zihan Zheng
    Description

    QAQAQAQAQ/LiveCodeBench-Pro dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. LiveCodeBench-CPP

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2025). LiveCodeBench-CPP [Dataset]. https://huggingface.co/datasets/nvidia/LiveCodeBench-CPP
    Explore at:
    Dataset updated
    May 11, 2025
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LiveCodeBench-CPP: An Extension of LiveCodeBench for Contamination Free Evaluation in C++

      Overview
    

    LiveCodeBench-CPP includes 279 problems from the release_v5 of LiveCodeBench, covering the period from October 2024 to January 2025. These problems are sourced from AtCoder (175 problems) and LeetCode (104 problems).

    AtCoder Problems: These require generated solutions to read inputs from standard input (stdin) and write outputs to standard output (stdout). For unit testing… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/LiveCodeBench-CPP.

  9. h

    LiveCodeBench

    • huggingface.co
    Updated May 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GenX (2025). LiveCodeBench [Dataset]. https://huggingface.co/datasets/Gen-Verse/LiveCodeBench
    Explore at:
    Dataset updated
    May 26, 2025
    Dataset authored and provided by
    GenX
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    We use Stdio input/output format here. For example, for the task to calculate the sum of a list, the input and output are in the following format: input = "5 1 2 3 4 5 " output = "15"

    CodeContests and CodeForces are using this format, however, MBPP and part of LiveCodeBench are using functional input/output format, such like assert sum_function([1, 2, 3, 4, 5]) == 15

    In this project, we have converted the the functional format to the Stdio format to achieve consistency. Paper | Code… See the full description on the dataset page: https://huggingface.co/datasets/Gen-Verse/LiveCodeBench.

  10. LiveCodeBench-CodeGeneration

    • huggingface.co
    Updated Mar 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Groq (2024). LiveCodeBench-CodeGeneration [Dataset]. https://huggingface.co/datasets/Groq/LiveCodeBench-CodeGeneration
    Explore at:
    Dataset updated
    Mar 2, 2024
    Dataset authored and provided by
    Groqhttps://groq.com/
    Description

    Groq/LiveCodeBench-CodeGeneration dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    Livecodebench-subset-50

    • huggingface.co
    Updated Jun 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ranajoy Sadhukhan (2025). Livecodebench-subset-50 [Dataset]. https://huggingface.co/datasets/Rano23/Livecodebench-subset-50
    Explore at:
    Dataset updated
    Jun 21, 2025
    Authors
    Ranajoy Sadhukhan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Rano23/Livecodebench-subset-50 dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    code_generation_lite-th

    • huggingface.co
    Updated Feb 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    iApp Technology (2025). code_generation_lite-th [Dataset]. https://huggingface.co/datasets/iapp/code_generation_lite-th
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 8, 2025
    Dataset authored and provided by
    iApp Technology
    Description

    LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

    Version 2.0 Original dataset: https://huggingface.co/datasets/livecodebench/code_generation_lite Translated to Thai by iApp Technology.

  13. h

    Synthia-S1-LiveCodeBench-Eval

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tesslate, Synthia-S1-LiveCodeBench-Eval [Dataset]. https://huggingface.co/datasets/Tesslate/Synthia-S1-LiveCodeBench-Eval
    Explore at:
    Dataset authored and provided by
    Tesslate
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Synthia S1 27B LiveCodeBench Outputs

    Done generating outputs. Evaluating now...

  14. h

    code_generation

    • huggingface.co
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tony Zhao (2025). code_generation [Dataset]. https://huggingface.co/datasets/ztony0712/code_generation
    Explore at:
    Dataset updated
    May 21, 2025
    Authors
    Tony Zhao
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Visualization of Code Generation Task Cases Samples

    Check dataset samples visualization by viewing Dataset Viewer. The sampling procedure is guided by the Elo distribution introduced in our method. Original dataset is release_v5 of livecodebench/code_generation_lite from hugging face. samples/origin: 879/880

      License
    

    This repository is licensed under the Apache License 2.0

  15. h

    DeepCoder-Preview-Dataset

    • huggingface.co
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agentica (2025). DeepCoder-Preview-Dataset [Dataset]. https://huggingface.co/datasets/agentica-org/DeepCoder-Preview-Dataset
    Explore at:
    Dataset updated
    Apr 8, 2025
    Dataset authored and provided by
    Agentica
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Data

    Our training dataset consists of 24K problems paired with their test cases:

    7.5K TACO Verified problems. 16K verified coding problems from PrimeIntellect’s SYNTHETIC-1. 600 LiveCodeBench (v5) problems submitted between May 1, 2023 and July 31, 2024.

    Our test dataset consists of:

    LiveCodeBench (v5) problems between August 1, 2024 and February 1, 2025. Codeforces problems from Qwen/CodeElo.

      Format
    

    Each row in the dataset contains:

    problem: The coding problem… See the full description on the dataset page: https://huggingface.co/datasets/agentica-org/DeepCoder-Preview-Dataset.

  16. h

    OpenThinker-7B_eval_03-11-25_18-35-31_0981

    • huggingface.co
    Updated Nov 25, 2003
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ML Foundations Development (2003). OpenThinker-7B_eval_03-11-25_18-35-31_0981 [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/OpenThinker-7B_eval_03-11-25_18-35-31_0981
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2003
    Dataset authored and provided by
    ML Foundations Development
    Description

    mlfoundations-dev/OpenThinker-7B_eval_03-11-25_18-35-31_0981

    Precomputed model outputs for evaluation.

      Evaluation Results
    
    
    
    
    
      Summary
    

    Metric LiveCodeBench AIME24 AIME25 AMC23 GPQADiamond MATH500

    Accuracy 38.9 32.0 24.0 71.0 29.8 83.0

      LiveCodeBench
    

    Average Accuracy: 38.94% ± 0.69% Number of Runs: 3

    Run Accuracy Questions Solved Total Questions

    1 38.36% 196 511

    2 38.16% 195 511

    3 40.31% 206 511

      AIME24… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations-dev/OpenThinker-7B_eval_03-11-25_18-35-31_0981.
    
  17. h

    hero_run_2_fix_conversations_eval_03-18-25_01-58-28_0981

    • huggingface.co
    Updated Mar 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ML Foundations Development (2025). hero_run_2_fix_conversations_eval_03-18-25_01-58-28_0981 [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/hero_run_2_fix_conversations_eval_03-18-25_01-58-28_0981
    Explore at:
    Dataset updated
    Mar 18, 2025
    Dataset authored and provided by
    ML Foundations Development
    Description

    mlfoundations-dev/hero_run_2_fix_conversations_eval_03-18-25_01-58-28_0981

    Precomputed model outputs for evaluation.

      Evaluation Results
    
    
    
    
    
      Summary
    

    Metric LiveCodeBench AIME24 AIME25 AMC23 GPQADiamond MATH500

    Accuracy 55.6 50.0 33.3 89.5 49.3 88.4

      LiveCodeBench
    

    Average Accuracy: 55.58% ± 0.79% Number of Runs: 3

    Run Accuracy Questions Solved Total Questions

    1 54.99% 281 511

    2 57.14% 292 511

    3 54.60% 279 511… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations-dev/hero_run_2_fix_conversations_eval_03-18-25_01-58-28_0981.

  18. h

    a1_science_camel_biology_1744691454_eval_1331

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ML Foundations Development, a1_science_camel_biology_1744691454_eval_1331 [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/a1_science_camel_biology_1744691454_eval_1331
    Explore at:
    Dataset authored and provided by
    ML Foundations Development
    Description

    mlfoundations-dev/a1_science_camel_biology_1744691454_eval_1331

    Precomputed model outputs for evaluation.

      Evaluation Results
    
    
    
    
    
      Summary
    

    Metric AIME24 AMC23 MATH500 GPQADiamond JEEBench MMLUPro LiveCodeBench CodeElo

    Accuracy 11.7 51.2 70.2 27.8 31.3 27.6 0.1 2.4

      AIME24
    

    Average Accuracy: 11.67% ± 1.51% Number of Runs: 10

    Run Accuracy Questions Solved Total Questions

    1 16.67% 5 30

    2 3.33% 1 30

    3 13.33% 4 30

    4 6.67% 2 30… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations-dev/a1_science_camel_biology_1744691454_eval_1331.

  19. h

    herorun_1_1_3epoch_eval_03-18-25_00-30-48_0981

    • huggingface.co
    Updated Mar 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ML Foundations Development (2025). herorun_1_1_3epoch_eval_03-18-25_00-30-48_0981 [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/herorun_1_1_3epoch_eval_03-18-25_00-30-48_0981
    Explore at:
    Dataset updated
    Mar 18, 2025
    Dataset authored and provided by
    ML Foundations Development
    Description

    mlfoundations-dev/herorun_1_1_3epoch_eval_03-18-25_00-30-48_0981

    Precomputed model outputs for evaluation.

      Evaluation Results
    
    
    
    
    
      Summary
    

    Metric LiveCodeBench AIME24 AIME25 AMC23 GPQADiamond MATH500

    Accuracy 43.1 46.0 28.0 80.5 44.4 87.0

      LiveCodeBench
    

    Average Accuracy: 43.12% ± 1.03% Number of Runs: 3

    Run Accuracy Questions Solved Total Questions

    1 45.01% 230 511

    2 41.49% 212 511

    3 42.86% 219 511

      AIME24… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations-dev/herorun_1_1_3epoch_eval_03-18-25_00-30-48_0981.
    
  20. h

    Light-R1-32B_1743569788_eval_0981

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ML Foundations Development, Light-R1-32B_1743569788_eval_0981 [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/Light-R1-32B_1743569788_eval_0981
    Explore at:
    Dataset authored and provided by
    ML Foundations Development
    Description

    mlfoundations-dev/Light-R1-32B_1743569788_eval_0981

    Precomputed model outputs for evaluation.

      Evaluation Results
    
    
    
    
    
      Summary
    

    Metric AIME24 AIME25 AMC23 MATH500 GPQADiamond LiveCodeBench

    Accuracy 75.3 55.3 95.5 90.2 22.6 55.7

      AIME24
    

    Average Accuracy: 75.33% ± 2.42% Number of Runs: 5

    Run Accuracy Questions Solved Total Questions

    1 83.33% 25 30

    2 76.67% 23 30

    3 73.33% 22 30

    4 66.67% 20 30

    5 76.67% 23 30… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations-dev/Light-R1-32B_1743569788_eval_0981.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
LiveBench (2025). coding [Dataset]. https://huggingface.co/datasets/livebench/coding

coding

livebench/coding

Explore at:
Dataset updated
Apr 2, 2025
Dataset authored and provided by
LiveBench
Description

Dataset Card for "livebench/coding"

LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored… See the full description on the dataset page: https://huggingface.co/datasets/livebench/coding.

Search
Clear search
Close search
Google apps
Main menu