2 datasets found
  1. h

    SAS-Bench

    • huggingface.co
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peichao Lai (2025). SAS-Bench [Dataset]. https://huggingface.co/datasets/aleversn/SAS-Bench
    Explore at:
    Dataset updated
    Jun 17, 2025
    Authors
    Peichao Lai
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models

    Dataset | δΈ­ζ–‡ | Paper | Code

      πŸ” Overview
    

    SAS-Bench represents the first specialized benchmark for evaluating Large Language Models (LLMs) on Short Answer Scoring (SAS) tasks. Utilizing authentic questions from China's National College Entrance Examination (Gaokao)… See the full description on the dataset page: https://huggingface.co/datasets/aleversn/SAS-Bench.

  2. P

    SAS-Bench Dataset

    • paperswithcode.com
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peichao Lai; Kexuan Zhang; Yi Lin; Linyihan Zhang; Feiyang Ye; Jinhao Yan; Yanwei Xu; Conghui He; Yilei Wang; Wentao Zhang; Bin Cui (2025). SAS-Bench Dataset [Dataset]. https://paperswithcode.com/dataset/sas-bench
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Peichao Lai; Kexuan Zhang; Yi Lin; Linyihan Zhang; Feiyang Ye; Jinhao Yan; Yanwei Xu; Conghui He; Yilei Wang; Wentao Zhang; Bin Cui
    Description

    SAS-Bench represents the first specialized benchmark for evaluating Large Language Models (LLMs) on Short Answer Scoring (SAS) tasks. Utilizing authentic questions from China's National College Entrance Examination (Gaokao), our benchmark offers:

    1,030 questions spanning 9 academic disciplines 4,109 expert-annotated student responses Step-wise scoring with Step-wise error analysis Multi-dimensional evaluation (holistic scoring, step-wise scoring, and error diagnosis consistency)

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Peichao Lai (2025). SAS-Bench [Dataset]. https://huggingface.co/datasets/aleversn/SAS-Bench

SAS-Bench

SAS_Bench

aleversn/SAS-Bench

Explore at:
Dataset updated
Jun 17, 2025
Authors
Peichao Lai
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models

Dataset | δΈ­ζ–‡ | Paper | Code

  πŸ” Overview

SAS-Bench represents the first specialized benchmark for evaluating Large Language Models (LLMs) on Short Answer Scoring (SAS) tasks. Utilizing authentic questions from China's National College Entrance Examination (Gaokao)… See the full description on the dataset page: https://huggingface.co/datasets/aleversn/SAS-Bench.

Search
Clear search
Close search
Google apps
Main menu