2 datasets found

h
SAS-Bench
huggingface.co
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peichao Lai (2025). SAS-Bench [Dataset]. https://huggingface.co/datasets/aleversn/SAS-Bench
Explore at:
Dataset updated
Jun 17, 2025
Authors
Peichao Lai
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models

Dataset | 中文 | Paper | Code

🔍 Overview

SAS-Bench represents the first specialized benchmark for evaluating Large Language Models (LLMs) on Short Answer Scoring (SAS) tasks. Utilizing authentic questions from China's National College Entrance Examination (Gaokao)… See the full description on the dataset page: https://huggingface.co/datasets/aleversn/SAS-Bench.
P
SAS-Bench Dataset
paperswithcode.com
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peichao Lai; Kexuan Zhang; Yi Lin; Linyihan Zhang; Feiyang Ye; Jinhao Yan; Yanwei Xu; Conghui He; Yilei Wang; Wentao Zhang; Bin Cui (2025). SAS-Bench Dataset [Dataset]. https://paperswithcode.com/dataset/sas-bench
Explore at:
Dataset updated
May 11, 2025
Authors
Peichao Lai; Kexuan Zhang; Yi Lin; Linyihan Zhang; Feiyang Ye; Jinhao Yan; Yanwei Xu; Conghui He; Yilei Wang; Wentao Zhang; Bin Cui
Description
SAS-Bench represents the first specialized benchmark for evaluating Large Language Models (LLMs) on Short Answer Scoring (SAS) tasks. Utilizing authentic questions from China's National College Entrance Examination (Gaokao), our benchmark offers:

1,030 questions spanning 9 academic disciplines 4,109 expert-annotated student responses Step-wise scoring with Step-wise error analysis Multi-dimensional evaluation (holistic scoring, step-wise scoring, and error diagnosis consistency)
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Peichao Lai (2025). SAS-Bench [Dataset]. https://huggingface.co/datasets/aleversn/SAS-Bench

SAS-Bench

SAS_Bench

aleversn/SAS-Bench

Explore at:

Dataset updated

Jun 17, 2025

Authors

Peichao Lai

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models

Dataset | 中文 | Paper | Code

  🔍 Overview

SAS-Bench represents the first specialized benchmark for evaluating Large Language Models (LLMs) on Short Answer Scoring (SAS) tasks. Utilizing authentic questions from China's National College Entrance Examination (Gaokao)… See the full description on the dataset page: https://huggingface.co/datasets/aleversn/SAS-Bench.

Clear search

Close search

Google apps

Main menu

SAS-Bench

SAS-Bench Dataset

SAS-Bench

SAS_Bench

aleversn/SAS-Bench