17 datasets found

h
LongBench-v2-Pause1
huggingface.co
Updated Dec 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Begin (2024). LongBench-v2-Pause1 [Dataset]. https://huggingface.co/datasets/JamesBegin/LongBench-v2-Pause1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 20, 2024
Authors
James Begin
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

🌐 Project Page: https://longbench2.github.io 💻 Github Repo: https://github.com/THUDM/LongBench 📚 Arxiv Paper: https://arxiv.org/abs/2412.15204 LongBench v2 is designed to assess the ability of LLMs to handle long-context problems requiring deep understanding and reasoning across real-world multitasks. LongBench v2 has the following features: (1) Length: Context length ranging from 8k to… See the full description on the dataset page: https://huggingface.co/datasets/JamesBegin/LongBench-v2-Pause1.
h
longbench
huggingface.co
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tongyi-Zhiwen (2025). longbench [Dataset]. https://huggingface.co/datasets/Tongyi-Zhiwen/longbench
Explore at:
Dataset updated
May 27, 2025
Dataset authored and provided by
Tongyi-Zhiwen
Description
Tongyi-Zhiwen/longbench dataset hosted on Hugging Face and contributed by the HF Datasets community
h
LongBench-512
huggingface.co
Updated Jul 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giulio Corallo (2025). LongBench-512 [Dataset]. https://huggingface.co/datasets/giulio98/LongBench-512
Explore at:
Dataset updated
Jul 4, 2025
Authors
Giulio Corallo
Description
giulio98/LongBench-512 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
LongBench-v2
huggingface.co
Updated Dec 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Z.ai (2024). LongBench-v2 [Dataset]. https://huggingface.co/datasets/zai-org/LongBench-v2
Explore at:
Dataset updated
Dec 20, 2024
Dataset provided by
Z.ai
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

🌐 Project Page: https://longbench2.github.io 💻 Github Repo: https://github.com/THUDM/LongBench 📚 Arxiv Paper: https://arxiv.org/abs/2412.15204 LongBench v2 is designed to assess the ability of LLMs to handle long-context problems requiring deep understanding and reasoning across real-world multitasks. LongBench v2 has the following features: (1) Length: Context length ranging from 8k to… See the full description on the dataset page: https://huggingface.co/datasets/zai-org/LongBench-v2.
P
L-Eval Dataset
library.toponeai.link
Updated Aug 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chenxin An; Shansan Gong; Ming Zhong; Xingjian Zhao; Mukai Li; Jun Zhang; Lingpeng Kong; Xipeng Qiu (2023). L-Eval Dataset [Dataset]. https://library.toponeai.link/dataset/l-eval
Explore at:
Dataset updated
Aug 29, 2023
Authors
Chenxin An; Shansan Gong; Ming Zhong; Xingjian Zhao; Mukai Li; Jun Zhang; Lingpeng Kong; Xipeng Qiu
Description
Although large language models (LLMs) demonstrate impressive performance for many language tasks, most of them can only handle texts a few thousand tokens long, limiting their applications on longer sequence inputs, such as books, reports, and codebases. Recent works have proposed methods to improve LLMs' long context capabilities by extending context windows and more sophisticated memory mechanisms. However, comprehensive benchmarks tailored for evaluating long context understanding are lacking. In this paper, we introduce LongBench, the first bilingual, multi-task benchmark for long context understanding, enabling a more rigorous evaluation of long context understanding. LongBench comprises 21 datasets across 6 task categories in both English and Chinese, with an average length of 6,711 words (English) and 13,386 characters (Chinese). These tasks cover key long-text application areas including single-doc QA, multi-doc QA, summarization, few-shot learning, synthetic tasks, and code completion. All datasets in LongBench are standardized into a unified format, allowing for effortless automatic evaluation of LLMs. Upon comprehensive evaluation of 8 LLMs on LongBench, we find that: (1) Commercial model (GPT-3.5-Turbo-16k) outperforms other open-sourced models, but still struggles on longer contexts. (2) Scaled position embedding and fine-tuning on longer sequences lead to substantial improvement on long context understanding. (3) Context compression technique such as retrieval brings improvement for model with weak ability on long contexts, but the performance still lags behind models that have strong long context understanding capability. The code and datasets are available at https://github.com/THUDM/LongBench.
O
LongBench
opendatalab.com
zip
Updated Aug 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tsinghua University (2023). LongBench [Dataset]. https://opendatalab.com/OpenDataLab/LongBench
Explore at:
zipAvailable download formats
Dataset updated
Aug 1, 2023
Dataset provided by
Tsinghua University
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
LongBench is the first benchmark for bilingual, multitask, and comprehensive assessment of long context understanding capabilities of large language models. LongBench includes different languages (Chinese and English) to provide a more comprehensive evaluation of the large models' multilingual capabilities on long contexts. In addition, LongBench is composed of six major categories and twenty one different tasks, covering key long-text application scenarios such as single-document QA, multi-document QA, summarization, few-shot learning, synthetic tasks and code completion.
h
LongBench
huggingface.co
Updated Jul 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Z.ai (2023). LongBench [Dataset]. https://huggingface.co/datasets/zai-org/LongBench
Explore at:
Dataset updated
Jul 31, 2023
Dataset provided by
Z.ai
Description
LongBench is a comprehensive benchmark for multilingual and multi-task purposes, with the goal to fully measure and evaluate the ability of pre-trained language models to understand long text. This dataset consists of twenty different tasks, covering key long-text application scenarios such as multi-document QA, single-document QA, summarization, few-shot learning, synthetic tasks, and code completion.
h
LongBench-T2I
huggingface.co
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yucheng Zhou (2025). LongBench-T2I [Dataset]. https://huggingface.co/datasets/YCZhou/LongBench-T2I
Explore at:
Dataset updated
Jun 6, 2025
Authors
Yucheng Zhou
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
LongBench-T2I

LongBench-T2I is a benchmark dataset introduced in the paper Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation.It is a standalone dataset designed specifically for evaluating text-to-image (T2I) generation models under long and compositionally rich prompts.

📦 Dataset Summary

This dataset contains 500 samples, each composed of:

A long-form instruction (complex natural language prompt). A… See the full description on the dataset page: https://huggingface.co/datasets/YCZhou/LongBench-T2I.
h
LongBench-2k
huggingface.co
Updated Jul 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Z (2025). LongBench-2k [Dataset]. https://huggingface.co/datasets/figuremout/LongBench-2k
Explore at:
Dataset updated
Jul 4, 2025
Authors
Z
Description
figuremout/LongBench-2k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
longbench
huggingface.co
Updated Jul 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Minghui Liu (2025). longbench [Dataset]. https://huggingface.co/datasets/minghuiliu/longbench
Explore at:
Dataset updated
Jul 30, 2025
Authors
Minghui Liu
Description
minghuiliu/longbench dataset hosted on Hugging Face and contributed by the HF Datasets community
h
LongBench-v2-reformatted
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhiwei He (2025). LongBench-v2-reformatted [Dataset]. https://huggingface.co/datasets/zwhe99/LongBench-v2-reformatted
Explore at:
Dataset updated
May 11, 2025
Authors
Zhiwei He
Description
zwhe99/LongBench-v2-reformatted dataset hosted on Hugging Face and contributed by the HF Datasets community
h
longbench-results
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Schmidt, longbench-results [Dataset]. https://huggingface.co/datasets/xy21593/longbench-results
Explore at:
Authors
Peter Schmidt
Description
xy21593/longbench-results dataset hosted on Hugging Face and contributed by the HF Datasets community
h
LongBench-hotpotqa-with-evidence-label
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suyuchen Wang (2025). LongBench-hotpotqa-with-evidence-label [Dataset]. https://huggingface.co/datasets/sheryc/LongBench-hotpotqa-with-evidence-label
Explore at:
Dataset updated
May 11, 2025
Authors
Suyuchen Wang
Description
sheryc/LongBench-hotpotqa-with-evidence-label dataset hosted on Hugging Face and contributed by the HF Datasets community
h
qna-l1-synthetic
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Milos Milunovic, qna-l1-synthetic [Dataset]. https://huggingface.co/datasets/mmilunovic/qna-l1-synthetic
Explore at:
Authors
Milos Milunovic
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
📚 Filtered Synthetic QA Dataset (L1 Questions, LongBench-v2)

This is a synthetic L1 QA dataset focused on simple, context-dependent questions.

Source documents are from LongBench-v2, covering Single/Multi-Document QA across Finance, Legal, and Government domains. Documents are split into 10k-token chunks. Each chunk is passed to DeepSeek-R1, which extracts/generates multiple L1-level QA pairs that are strictly grounded in that chunk. Questions target information retrieval: facts… See the full description on the dataset page: https://huggingface.co/datasets/mmilunovic/qna-l1-synthetic.
h
ContextStretchQA
huggingface.co
Updated Jun 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Linus Stuhlmann (2025). ContextStretchQA [Dataset]. https://huggingface.co/datasets/slinusc/ContextStretchQA
Explore at:
Dataset updated
Jun 8, 2025
Authors
Linus Stuhlmann
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Below is a structured, professional‐tone description of the “QA Increasing Context Length” dataset. You can use this text as a README, a data card, or incorporate it directly into documentation.

QA Increasing Context Length Dataset 1. Overview

The QA Increasing Context Length dataset is designed to facilitate benchmarking and research on question‐answering (QA) systems as the size of the input context grows. It compiles QA examples drawn from multiple LongBench subsets… See the full description on the dataset page: https://huggingface.co/datasets/slinusc/ContextStretchQA.
h
LongAlign-10k
huggingface.co
Updated Feb 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Z.ai (2024). LongAlign-10k [Dataset]. https://huggingface.co/datasets/zai-org/LongAlign-10k
Explore at:
Dataset updated
Feb 1, 2024
Dataset provided by
Z.ai
Description
LongAlign-10k

🤗 [LongAlign Dataset] • 💻 [Github Repo] • 📃 [LongAlign Paper]

LongAlign is the first full recipe for LLM alignment on long context. We propose the LongAlign-10k dataset, containing 10,000 long instruction data of 8k-64k in length. We investigate on trianing strategies, namely packing (with loss weighting) and sorted batching, which are all implemented in our code. For real-world long context evaluation, we introduce LongBench-Chat that evaluate the… See the full description on the dataset page: https://huggingface.co/datasets/zai-org/LongAlign-10k.
h
Marathon
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lei Zhang, Marathon [Dataset]. https://huggingface.co/datasets/Lemoncoke/Marathon
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Lei Zhang
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for Marathon

Release

[2024/05/15] 🔥 Marathon is accepted by ACL 2024 Main Conference.

Dataset Summary

Marathon benchmark is a new long-context multiple-choice benchmark, mainly based on LooGLE, with some original data from LongBench. The context length can reach up to 200K+. Marathon benchmark comprises six tasks: Comprehension and Reasoning, Multiple Information Retrieval, Timeline Reorder, Computation, Passage Retrieval, and Short… See the full description on the dataset page: https://huggingface.co/datasets/Lemoncoke/Marathon.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

James Begin (2024). LongBench-v2-Pause1 [Dataset]. https://huggingface.co/datasets/JamesBegin/LongBench-v2-Pause1

LongBench-v2-Pause1

JamesBegin/LongBench-v2-Pause1

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 20, 2024

Authors

James Begin

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

🌐 Project Page: https://longbench2.github.io 💻 Github Repo: https://github.com/THUDM/LongBench 📚 Arxiv Paper: https://arxiv.org/abs/2412.15204 LongBench v2 is designed to assess the ability of LLMs to handle long-context problems requiring deep understanding and reasoning across real-world multitasks. LongBench v2 has the following features: (1) Length: Context length ranging from 8k to… See the full description on the dataset page: https://huggingface.co/datasets/JamesBegin/LongBench-v2-Pause1.

Clear search

Close search

Google apps

Main menu

LongBench-v2-Pause1

longbench

LongBench-512

LongBench-v2

L-Eval Dataset

LongBench

LongBench

LongBench-T2I

LongBench-2k

longbench

LongBench-v2-reformatted

longbench-results

LongBench-hotpotqa-with-evidence-label

qna-l1-synthetic

ContextStretchQA

LongAlign-10k

Marathon

LongBench-v2-Pause1See More Versions

JamesBegin/LongBench-v2-Pause1

LongBench-v2-Pause1