17 datasets found
  1. h

    LiveMathBench

    • huggingface.co
    Updated Jan 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCompass (2025). LiveMathBench [Dataset]. https://huggingface.co/datasets/opencompass/LiveMathBench
    Explore at:
    Dataset updated
    Jan 6, 2025
    Dataset authored and provided by
    OpenCompass
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for "LiveMathBench"

    Homepage: https://open-compass.github.io/GPassK/ Repository: https://github.com/open-compass/GPassK Paper: Are Your LLMs Capable of Stable Reasoning?

      Introduction
    

    LiveMathBench is a mathematical dataset, specifically designed to include challenging latest question sets from various mathematical competitions, aiming to avoid data contamination issues in existing LLMs and public math benchmarks.

      Leaderboard
    

    The Latest… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/LiveMathBench.

  2. h

    MMBench-Video

    • huggingface.co
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCompass (2024). MMBench-Video [Dataset]. https://huggingface.co/datasets/opencompass/MMBench-Video
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 30, 2024
    Dataset authored and provided by
    OpenCompass
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

    Homepage: https://mmbench-video.github.io/ Repository: https://huggingface.co/datasets/opencompass/MMBench-Video Paper: MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding.

      Introduction
    

    MMBench-Video is a quantitative benchmark designed to rigorously evaluate LVLMs' proficiency in video understanding. MMBench-Video incorporates approximately 600 web videos… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/MMBench-Video.

  3. h

    CodeCompass

    • huggingface.co
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCompass (2025). CodeCompass [Dataset]. https://huggingface.co/datasets/opencompass/CodeCompass
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset authored and provided by
    OpenCompass
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    CodeCompass: A Benchmark for Code Generation

    Paper: Rethinking Verification for LLM Code Generation: From Generation to Testing

      Description
    

    CodeCompass is a rigorous benchmark designed to evaluate the code generation capabilities of Large Language Models (LLMs). It comprises a comprehensive collection of programming problems sourced from competitive platforms, offering a standardized framework for assessing algorithmic reasoning, problem-solving, and code synthesis in a… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/CodeCompass.

  4. h

    CriticBench

    • huggingface.co
    Updated Feb 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCompass (2024). CriticBench [Dataset]. https://huggingface.co/datasets/opencompass/CriticBench
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2024
    Dataset authored and provided by
    OpenCompass
    Description

    CriticBench: Evaluating Large Language Model as Critic

    This repository is the official implementation of CriticBench, a comprehensive benchmark for evaluating critique ability of LLMs.

      Introduction
    

    CriticBench: Evaluating Large Language Model as Critic

    Tian Lan1*, Wenwei Zhang2*, Chen Xu1, Heyan Huang1, Dahua Lin2, Kai Chen2†, Xian-ling Mao1† († Corresponding Author, * Equal Contribution) 1 Beijing Institute of Technology, 2 Shanghai AI Laboratory

    [Dataset on HF]… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/CriticBench.

  5. h

    compass_academic_predictions

    • huggingface.co
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCompass (2025). compass_academic_predictions [Dataset]. https://huggingface.co/datasets/opencompass/compass_academic_predictions
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset authored and provided by
    OpenCompass
    Description

    Compass Academic Predictions

    This dataset stores most of the reusable evaluation results of Opencompass, currently including predictions of models on different datasets.

  6. h

    VerifierBench

    • huggingface.co
    Updated Aug 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCompass (2025). VerifierBench [Dataset]. https://huggingface.co/datasets/opencompass/VerifierBench
    Explore at:
    Dataset updated
    Aug 7, 2025
    Dataset authored and provided by
    OpenCompass
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for VerifierBench

      Dataset Description
    

    VerifierBench is a comprehensive benchmark for evaluating the verification capabilities of Large Language Models (LLMs). It demonstrates multi-domain competency spanning math, knowledge, science, and diverse reasoning tasks, with the capability to process various answer types, including multi-subproblems, formulas, and sequence answers, while effectively… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/VerifierBench.

  7. h

    AIME2025

    • huggingface.co
    Updated Feb 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCompass (2025). AIME2025 [Dataset]. https://huggingface.co/datasets/opencompass/AIME2025
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 8, 2025
    Dataset authored and provided by
    OpenCompass
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    AIME 2025 Dataset

      Dataset Description
    

    This dataset contains problems from the American Invitational Mathematics Examination (AIME) 2025-I & II.

  8. THE-OPEN-COMPASS (Company) - Reverse Whois Lookup

    • whoisdatacenter.com
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc, THE-OPEN-COMPASS (Company) - Reverse Whois Lookup [Dataset]. https://whoisdatacenter.com/company/THE-OPEN-COMPASS/
    Explore at:
    csvAvailable download formats
    Dataset provided by
    AllHeart Web
    Authors
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Jul 19, 2025
    Description

    Uncover historical ownership history and changes over time by performing a reverse Whois lookup for the company THE-OPEN-COMPASS.

  9. h

    NeedleBench

    • huggingface.co
    Updated Aug 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCompass (2024). NeedleBench [Dataset]. https://huggingface.co/datasets/opencompass/NeedleBench
    Explore at:
    Dataset updated
    Aug 2, 2024
    Dataset authored and provided by
    OpenCompass
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Description

      Dataset Summary
    

    The NeedleBench dataset is a part of the OpenCompass project, designed to evaluate the capabilities of large language models (LLMs) in processing and understanding long documents. It includes a series of test scenarios that assess models' abilities in long text information extraction and reasoning. The dataset is structured to support tasks such as single-needle retrieval, multi-needle retrieval, multi-needle reasoning, and ancestral… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/NeedleBench.

  10. h

    MMBench

    • huggingface.co
    Updated Oct 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCompass (2023). MMBench [Dataset]. https://huggingface.co/datasets/opencompass/MMBench
    Explore at:
    Dataset updated
    Oct 17, 2023
    Dataset authored and provided by
    OpenCompass
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    opencompass/MMBench dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    CodeForce_SAGA

    • huggingface.co
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCompass (2025). CodeForce_SAGA [Dataset]. https://huggingface.co/datasets/opencompass/CodeForce_SAGA
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    OpenCompass
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    CodeForce-SAGA: A Self-Correction-Augmented Code Generation Dataset

    CodeForce-SAGA is a large-scale, high-quality training dataset designed to enhance the code generation and problem-solving capabilities of Large Language Models (LLMs). All problems and solutions are sourced from the competitive programming platform Codeforces. This dataset is built upon the SAGA (Strategic Adversarial & Constraint-differential Generative workflow) framework, a novel human-LLM collaborative… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/CodeForce_SAGA.

  12. h

    mmmlu_lite

    • huggingface.co
    Updated Nov 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCompass (2024). mmmlu_lite [Dataset]. https://huggingface.co/datasets/opencompass/mmmlu_lite
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 1, 2024
    Dataset authored and provided by
    OpenCompass
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    MMMLU-Lite

      Introduction
    

    A lite version of the MMMLU dataset, which is an community version of the MMMLU dataset by OpenCompass. Due to the large size of the original dataset (about 200k questions), we have created a lite version of the dataset to make it easier to use. We sample 25 examples from each language subject in the original dataset with fixed seed to ensure reproducibility, finally we have 19950 examples in the lite version of the dataset, which is about 10% of… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/mmmlu_lite.

  13. h

    REST

    • huggingface.co
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    anonymous0523 (2025). REST [Dataset]. https://huggingface.co/datasets/anonymous0523/REST
    Explore at:
    Dataset updated
    May 29, 2025
    Authors
    anonymous0523
    Description

    Dataset Card for Dataset Name

    This dataset provides data for the REST benchmark. They are identical to the original data of the corresponding benchmarks. REST combines multiple questions into one prompt by modifying the corresponding data loading method in OpenCompass.

      Data preparation
    

    REST constructs the multi-problem version when loading the datasets, implemented in the StressDataset class. So the data preparation is identical to the official practice of opencompass.… See the full description on the dataset page: https://huggingface.co/datasets/anonymous0523/REST.

  14. h

    Sailcompass_data

    • huggingface.co
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sea AI Lab (2024). Sailcompass_data [Dataset]. https://huggingface.co/datasets/sail/Sailcompass_data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 13, 2024
    Dataset authored and provided by
    Sea AI Lab
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages

    This repository provides the dataset for evaluation SEA large language model.

    Project Website: sailorllm.github.io Codebase: https://github.com/sail-sg/sailcompass

      Acknowledgment
    

    Thanks to the contributors of the opencompass.

      Citing this work
    

    If you use this repository or sailor models, please cite @misc{sailcompass, title={SailCompass: Towards Reproducible… See the full description on the dataset page: https://huggingface.co/datasets/sail/Sailcompass_data.

  15. h

    StaticEmbodiedBench

    • huggingface.co
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiao Jiahao (2025). StaticEmbodiedBench [Dataset]. https://huggingface.co/datasets/xiaojiahao/StaticEmbodiedBench
    Explore at:
    Dataset updated
    Jul 3, 2025
    Authors
    Xiao Jiahao
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📘 Dataset Description

    StaticEmbodiedBench is a dataset for evaluating vision-language models on embodied intelligence tasks, as featured in the OpenCompass leaderboard. It covers three key capabilities:

    Macro Planning: Decomposing a complex task into a sequence of simpler subtasks. Micro Perception: Performing concrete simple tasks such as spatial understanding and fine-grained perception. Stage-wise Reasoning: Deciding the next action based on the agent’s current state and… See the full description on the dataset page: https://huggingface.co/datasets/xiaojiahao/StaticEmbodiedBench.

  16. h

    CMB

    • huggingface.co
    Updated Oct 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fu Zichuan (2024). CMB [Dataset]. https://huggingface.co/datasets/fzkuji/CMB
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 28, 2024
    Authors
    Fu Zichuan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    CMB: A Comprehensive Medical Benchmark in Chinese

    🌐 Github • 🌐 Website • 🤗 HuggingFace

      🌈 Update
    

    [2024.02.21] The answers to the CMB-Exam test has been updated and some errors caused by omissions in version management have been fixed. [2024.01.08] In order to facilitate testing, we disclose the answers to the CMB-Exam test [2023.09.22] CMB is included in OpenCompass. [2023.08.21] Paper released. [2023.08.01] 🎉🎉🎉 CMB is published!🎉🎉🎉

      🌐… See the full description on the dataset page: https://huggingface.co/datasets/fzkuji/CMB.
    
  17. h

    MMPR

    • huggingface.co
    Updated Nov 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenGVLab (2024). MMPR [Dataset]. https://huggingface.co/datasets/OpenGVLab/MMPR
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 14, 2024
    Dataset authored and provided by
    OpenGVLab
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    MMPR

    [📂 GitHub] [🆕 Blog] [📜 Paper] [📖 Documents] 2025/04/11: We release a new version of MMPR (i.e., MMPR-v1.2), which greatly enhances the overall performance of InternVL3. 2024/12/20: We release a new version of MMPR (i.e., MMPR-v1.1). Based on this dataset, InternVL2.5 outperforms its counterparts without MPO by an average of 2 points across all scales on the OpenCompass leaderboard.

      Introduction
    

    MMPR is a large-scale and high-quality multimodal reasoning… See the full description on the dataset page: https://huggingface.co/datasets/OpenGVLab/MMPR.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
OpenCompass (2025). LiveMathBench [Dataset]. https://huggingface.co/datasets/opencompass/LiveMathBench

LiveMathBench

opencompass/LiveMathBench

Explore at:
11 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jan 6, 2025
Dataset authored and provided by
OpenCompass
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset Card for "LiveMathBench"

Homepage: https://open-compass.github.io/GPassK/ Repository: https://github.com/open-compass/GPassK Paper: Are Your LLMs Capable of Stable Reasoning?

  Introduction

LiveMathBench is a mathematical dataset, specifically designed to include challenging latest question sets from various mathematical competitions, aiming to avoid data contamination issues in existing LLMs and public math benchmarks.

  Leaderboard

The Latest… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/LiveMathBench.

Search
Clear search
Close search
Google apps
Main menu