17 datasets found

h
LiveMathBench
huggingface.co
Updated Jan 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCompass (2025). LiveMathBench [Dataset]. https://huggingface.co/datasets/opencompass/LiveMathBench
Explore at:
Dataset updated
Jan 6, 2025
Dataset authored and provided by
OpenCompass
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for "LiveMathBench"

Homepage: https://open-compass.github.io/GPassK/ Repository: https://github.com/open-compass/GPassK Paper: Are Your LLMs Capable of Stable Reasoning?

Introduction

LiveMathBench is a mathematical dataset, specifically designed to include challenging latest question sets from various mathematical competitions, aiming to avoid data contamination issues in existing LLMs and public math benchmarks.

Leaderboard

The Latest… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/LiveMathBench.
h
MMBench-Video
huggingface.co
Updated Jul 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCompass (2024). MMBench-Video [Dataset]. https://huggingface.co/datasets/opencompass/MMBench-Video
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 30, 2024
Dataset authored and provided by
OpenCompass
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Homepage: https://mmbench-video.github.io/ Repository: https://huggingface.co/datasets/opencompass/MMBench-Video Paper: MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding.

Introduction

MMBench-Video is a quantitative benchmark designed to rigorously evaluate LVLMs' proficiency in video understanding. MMBench-Video incorporates approximately 600 web videos… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/MMBench-Video.
h
CodeCompass
huggingface.co
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCompass (2025). CodeCompass [Dataset]. https://huggingface.co/datasets/opencompass/CodeCompass
Explore at:
Dataset updated
Jul 9, 2025
Dataset authored and provided by
OpenCompass
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
CodeCompass: A Benchmark for Code Generation

Paper: Rethinking Verification for LLM Code Generation: From Generation to Testing

Description

CodeCompass is a rigorous benchmark designed to evaluate the code generation capabilities of Large Language Models (LLMs). It comprises a comprehensive collection of programming problems sourced from competitive platforms, offering a standardized framework for assessing algorithmic reasoning, problem-solving, and code synthesis in a… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/CodeCompass.
h
CriticBench
huggingface.co
Updated Feb 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCompass (2024). CriticBench [Dataset]. https://huggingface.co/datasets/opencompass/CriticBench
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 21, 2024
Dataset authored and provided by
OpenCompass
Description
CriticBench: Evaluating Large Language Model as Critic

This repository is the official implementation of CriticBench, a comprehensive benchmark for evaluating critique ability of LLMs.

Introduction

CriticBench: Evaluating Large Language Model as Critic

Tian Lan1*, Wenwei Zhang2*, Chen Xu1, Heyan Huang1, Dahua Lin2, Kai Chen2†, Xian-ling Mao1† († Corresponding Author, * Equal Contribution) 1 Beijing Institute of Technology, 2 Shanghai AI Laboratory

[Dataset on HF]… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/CriticBench.
h
compass_academic_predictions
huggingface.co
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCompass (2025). compass_academic_predictions [Dataset]. https://huggingface.co/datasets/opencompass/compass_academic_predictions
Explore at:
Dataset updated
Apr 10, 2025
Dataset authored and provided by
OpenCompass
Description
Compass Academic Predictions

This dataset stores most of the reusable evaluation results of Opencompass, currently including predictions of models on different datasets.
h
VerifierBench
huggingface.co
Updated Aug 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCompass (2025). VerifierBench [Dataset]. https://huggingface.co/datasets/opencompass/VerifierBench
Explore at:
Dataset updated
Aug 7, 2025
Dataset authored and provided by
OpenCompass
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for VerifierBench

Dataset Description

VerifierBench is a comprehensive benchmark for evaluating the verification capabilities of Large Language Models (LLMs). It demonstrates multi-domain competency spanning math, knowledge, science, and diverse reasoning tasks, with the capability to process various answer types, including multi-subproblems, formulas, and sequence answers, while effectively… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/VerifierBench.
h
AIME2025
huggingface.co
Updated Feb 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCompass (2025). AIME2025 [Dataset]. https://huggingface.co/datasets/opencompass/AIME2025
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 8, 2025
Dataset authored and provided by
OpenCompass
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
AIME 2025 Dataset

Dataset Description

This dataset contains problems from the American Invitational Mathematics Examination (AIME) 2025-I & II.
THE-OPEN-COMPASS (Company) - Reverse Whois Lookup
whoisdatacenter.com
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AllHeart Web Inc, THE-OPEN-COMPASS (Company) - Reverse Whois Lookup [Dataset]. https://whoisdatacenter.com/company/THE-OPEN-COMPASS/
Explore at:
csvAvailable download formats
Dataset provided by
AllHeart Web
Authors
AllHeart Web Inc
License
https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Time period covered
Mar 15, 1985 - Jul 19, 2025
Description
Uncover historical ownership history and changes over time by performing a reverse Whois lookup for the company THE-OPEN-COMPASS.
h
NeedleBench
huggingface.co
Updated Aug 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCompass (2024). NeedleBench [Dataset]. https://huggingface.co/datasets/opencompass/NeedleBench
Explore at:
Dataset updated
Aug 2, 2024
Dataset authored and provided by
OpenCompass
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Description

Dataset Summary

The NeedleBench dataset is a part of the OpenCompass project, designed to evaluate the capabilities of large language models (LLMs) in processing and understanding long documents. It includes a series of test scenarios that assess models' abilities in long text information extraction and reasoning. The dataset is structured to support tasks such as single-needle retrieval, multi-needle retrieval, multi-needle reasoning, and ancestral… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/NeedleBench.
h
MMBench
huggingface.co
Updated Oct 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCompass (2023). MMBench [Dataset]. https://huggingface.co/datasets/opencompass/MMBench
Explore at:
Dataset updated
Oct 17, 2023
Dataset authored and provided by
OpenCompass
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
opencompass/MMBench dataset hosted on Hugging Face and contributed by the HF Datasets community
h
CodeForce_SAGA
huggingface.co
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCompass (2025). CodeForce_SAGA [Dataset]. https://huggingface.co/datasets/opencompass/CodeForce_SAGA
Explore at:
Dataset updated
Jul 11, 2025
Dataset authored and provided by
OpenCompass
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
CodeForce-SAGA: A Self-Correction-Augmented Code Generation Dataset

CodeForce-SAGA is a large-scale, high-quality training dataset designed to enhance the code generation and problem-solving capabilities of Large Language Models (LLMs). All problems and solutions are sourced from the competitive programming platform Codeforces. This dataset is built upon the SAGA (Strategic Adversarial & Constraint-differential Generative workflow) framework, a novel human-LLM collaborative… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/CodeForce_SAGA.
h
mmmlu_lite
huggingface.co
Updated Nov 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCompass (2024). mmmlu_lite [Dataset]. https://huggingface.co/datasets/opencompass/mmmlu_lite
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 1, 2024
Dataset authored and provided by
OpenCompass
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
MMMLU-Lite

Introduction

A lite version of the MMMLU dataset, which is an community version of the MMMLU dataset by OpenCompass. Due to the large size of the original dataset (about 200k questions), we have created a lite version of the dataset to make it easier to use. We sample 25 examples from each language subject in the original dataset with fixed seed to ensure reproducibility, finally we have 19950 examples in the lite version of the dataset, which is about 10% of… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/mmmlu_lite.
h
REST
huggingface.co
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
anonymous0523 (2025). REST [Dataset]. https://huggingface.co/datasets/anonymous0523/REST
Explore at:
Dataset updated
May 29, 2025
Authors
anonymous0523
Description
Dataset Card for Dataset Name

This dataset provides data for the REST benchmark. They are identical to the original data of the corresponding benchmarks. REST combines multiple questions into one prompt by modifying the corresponding data loading method in OpenCompass.

Data preparation

REST constructs the multi-problem version when loading the datasets, implemented in the StressDataset class. So the data preparation is identical to the official practice of opencompass.… See the full description on the dataset page: https://huggingface.co/datasets/anonymous0523/REST.
h
Sailcompass_data
huggingface.co
Updated Jun 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sea AI Lab (2024). Sailcompass_data [Dataset]. https://huggingface.co/datasets/sail/Sailcompass_data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 13, 2024
Dataset authored and provided by
Sea AI Lab
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages

This repository provides the dataset for evaluation SEA large language model.

Project Website: sailorllm.github.io Codebase: https://github.com/sail-sg/sailcompass

Acknowledgment

Thanks to the contributors of the opencompass.

Citing this work

If you use this repository or sailor models, please cite @misc{sailcompass, title={SailCompass: Towards Reproducible… See the full description on the dataset page: https://huggingface.co/datasets/sail/Sailcompass_data.
h
StaticEmbodiedBench
huggingface.co
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiao Jiahao (2025). StaticEmbodiedBench [Dataset]. https://huggingface.co/datasets/xiaojiahao/StaticEmbodiedBench
Explore at:
Dataset updated
Jul 3, 2025
Authors
Xiao Jiahao
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
📘 Dataset Description

StaticEmbodiedBench is a dataset for evaluating vision-language models on embodied intelligence tasks, as featured in the OpenCompass leaderboard. It covers three key capabilities:

Macro Planning: Decomposing a complex task into a sequence of simpler subtasks. Micro Perception: Performing concrete simple tasks such as spatial understanding and fine-grained perception. Stage-wise Reasoning: Deciding the next action based on the agent’s current state and… See the full description on the dataset page: https://huggingface.co/datasets/xiaojiahao/StaticEmbodiedBench.
h
CMB
huggingface.co
Updated Oct 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fu Zichuan (2024). CMB [Dataset]. https://huggingface.co/datasets/fzkuji/CMB
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 28, 2024
Authors
Fu Zichuan
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
CMB: A Comprehensive Medical Benchmark in Chinese

🌐 Github • 🌐 Website • 🤗 HuggingFace

🌈 Update

[2024.02.21] The answers to the CMB-Exam test has been updated and some errors caused by omissions in version management have been fixed. [2024.01.08] In order to facilitate testing, we disclose the answers to the CMB-Exam test [2023.09.22] CMB is included in OpenCompass. [2023.08.21] Paper released. [2023.08.01] 🎉🎉🎉 CMB is published！🎉🎉🎉

🌐… See the full description on the dataset page: https://huggingface.co/datasets/fzkuji/CMB.
h
MMPR
huggingface.co
Updated Nov 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenGVLab (2024). MMPR [Dataset]. https://huggingface.co/datasets/OpenGVLab/MMPR
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 14, 2024
Dataset authored and provided by
OpenGVLab
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
MMPR

[📂 GitHub] [🆕 Blog] [📜 Paper] [📖 Documents] 2025/04/11: We release a new version of MMPR (i.e., MMPR-v1.2), which greatly enhances the overall performance of InternVL3. 2024/12/20: We release a new version of MMPR (i.e., MMPR-v1.1). Based on this dataset, InternVL2.5 outperforms its counterparts without MPO by an average of 2 points across all scales on the OpenCompass leaderboard.

Introduction

MMPR is a large-scale and high-quality multimodal reasoning… See the full description on the dataset page: https://huggingface.co/datasets/OpenGVLab/MMPR.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

OpenCompass (2025). LiveMathBench [Dataset]. https://huggingface.co/datasets/opencompass/LiveMathBench

LiveMathBench

opencompass/LiveMathBench

Explore at:

11 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jan 6, 2025

Dataset authored and provided by

OpenCompass

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset Card for "LiveMathBench"

Homepage: https://open-compass.github.io/GPassK/ Repository: https://github.com/open-compass/GPassK Paper: Are Your LLMs Capable of Stable Reasoning?

  Introduction

LiveMathBench is a mathematical dataset, specifically designed to include challenging latest question sets from various mathematical competitions, aiming to avoid data contamination issues in existing LLMs and public math benchmarks.

  Leaderboard

The Latest… See the full description on the dataset page: https://huggingface.co/datasets/opencompass/LiveMathBench.

Clear search

Close search

Google apps

Main menu

LiveMathBench

MMBench-Video

CodeCompass

CriticBench

compass_academic_predictions

VerifierBench

AIME2025

THE-OPEN-COMPASS (Company) - Reverse Whois Lookup

NeedleBench

MMBench

CodeForce_SAGA

mmmlu_lite

REST

Sailcompass_data

StaticEmbodiedBench

CMB

MMPR

LiveMathBench

opencompass/LiveMathBench