17 datasets found
  1. h

    LongBench-v2-Pause1

    • huggingface.co
    Updated Dec 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Begin (2024). LongBench-v2-Pause1 [Dataset]. https://huggingface.co/datasets/JamesBegin/LongBench-v2-Pause1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 20, 2024
    Authors
    James Begin
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

    🌐 Project Page: https://longbench2.github.io 💻 Github Repo: https://github.com/THUDM/LongBench 📚 Arxiv Paper: https://arxiv.org/abs/2412.15204 LongBench v2 is designed to assess the ability of LLMs to handle long-context problems requiring deep understanding and reasoning across real-world multitasks. LongBench v2 has the following features: (1) Length: Context length ranging from 8k to… See the full description on the dataset page: https://huggingface.co/datasets/JamesBegin/LongBench-v2-Pause1.

  2. h

    longbench

    • huggingface.co
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tongyi-Zhiwen (2025). longbench [Dataset]. https://huggingface.co/datasets/Tongyi-Zhiwen/longbench
    Explore at:
    Dataset updated
    May 27, 2025
    Dataset authored and provided by
    Tongyi-Zhiwen
    Description

    Tongyi-Zhiwen/longbench dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    LongBench-512

    • huggingface.co
    Updated Jul 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giulio Corallo (2025). LongBench-512 [Dataset]. https://huggingface.co/datasets/giulio98/LongBench-512
    Explore at:
    Dataset updated
    Jul 4, 2025
    Authors
    Giulio Corallo
    Description

    giulio98/LongBench-512 dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    LongBench-v2

    • huggingface.co
    Updated Dec 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Z.ai (2024). LongBench-v2 [Dataset]. https://huggingface.co/datasets/zai-org/LongBench-v2
    Explore at:
    Dataset updated
    Dec 20, 2024
    Dataset provided by
    Z.ai
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

    🌐 Project Page: https://longbench2.github.io 💻 Github Repo: https://github.com/THUDM/LongBench 📚 Arxiv Paper: https://arxiv.org/abs/2412.15204 LongBench v2 is designed to assess the ability of LLMs to handle long-context problems requiring deep understanding and reasoning across real-world multitasks. LongBench v2 has the following features: (1) Length: Context length ranging from 8k to… See the full description on the dataset page: https://huggingface.co/datasets/zai-org/LongBench-v2.

  5. P

    L-Eval Dataset

    • library.toponeai.link
    Updated Aug 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chenxin An; Shansan Gong; Ming Zhong; Xingjian Zhao; Mukai Li; Jun Zhang; Lingpeng Kong; Xipeng Qiu (2023). L-Eval Dataset [Dataset]. https://library.toponeai.link/dataset/l-eval
    Explore at:
    Dataset updated
    Aug 29, 2023
    Authors
    Chenxin An; Shansan Gong; Ming Zhong; Xingjian Zhao; Mukai Li; Jun Zhang; Lingpeng Kong; Xipeng Qiu
    Description

    Although large language models (LLMs) demonstrate impressive performance for many language tasks, most of them can only handle texts a few thousand tokens long, limiting their applications on longer sequence inputs, such as books, reports, and codebases. Recent works have proposed methods to improve LLMs' long context capabilities by extending context windows and more sophisticated memory mechanisms. However, comprehensive benchmarks tailored for evaluating long context understanding are lacking. In this paper, we introduce LongBench, the first bilingual, multi-task benchmark for long context understanding, enabling a more rigorous evaluation of long context understanding. LongBench comprises 21 datasets across 6 task categories in both English and Chinese, with an average length of 6,711 words (English) and 13,386 characters (Chinese). These tasks cover key long-text application areas including single-doc QA, multi-doc QA, summarization, few-shot learning, synthetic tasks, and code completion. All datasets in LongBench are standardized into a unified format, allowing for effortless automatic evaluation of LLMs. Upon comprehensive evaluation of 8 LLMs on LongBench, we find that: (1) Commercial model (GPT-3.5-Turbo-16k) outperforms other open-sourced models, but still struggles on longer contexts. (2) Scaled position embedding and fine-tuning on longer sequences lead to substantial improvement on long context understanding. (3) Context compression technique such as retrieval brings improvement for model with weak ability on long contexts, but the performance still lags behind models that have strong long context understanding capability. The code and datasets are available at https://github.com/THUDM/LongBench.

  6. O

    LongBench

    • opendatalab.com
    zip
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tsinghua University (2023). LongBench [Dataset]. https://opendatalab.com/OpenDataLab/LongBench
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 1, 2023
    Dataset provided by
    Tsinghua University
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    LongBench is the first benchmark for bilingual, multitask, and comprehensive assessment of long context understanding capabilities of large language models. LongBench includes different languages (Chinese and English) to provide a more comprehensive evaluation of the large models' multilingual capabilities on long contexts. In addition, LongBench is composed of six major categories and twenty one different tasks, covering key long-text application scenarios such as single-document QA, multi-document QA, summarization, few-shot learning, synthetic tasks and code completion.

  7. h

    LongBench

    • huggingface.co
    Updated Jul 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Z.ai (2023). LongBench [Dataset]. https://huggingface.co/datasets/zai-org/LongBench
    Explore at:
    Dataset updated
    Jul 31, 2023
    Dataset provided by
    Z.ai
    Description

    LongBench is a comprehensive benchmark for multilingual and multi-task purposes, with the goal to fully measure and evaluate the ability of pre-trained language models to understand long text. This dataset consists of twenty different tasks, covering key long-text application scenarios such as multi-document QA, single-document QA, summarization, few-shot learning, synthetic tasks, and code completion.

  8. h

    LongBench-T2I

    • huggingface.co
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yucheng Zhou (2025). LongBench-T2I [Dataset]. https://huggingface.co/datasets/YCZhou/LongBench-T2I
    Explore at:
    Dataset updated
    Jun 6, 2025
    Authors
    Yucheng Zhou
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    LongBench-T2I

    LongBench-T2I is a benchmark dataset introduced in the paper Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation.It is a standalone dataset designed specifically for evaluating text-to-image (T2I) generation models under long and compositionally rich prompts.

      📦 Dataset Summary
    

    This dataset contains 500 samples, each composed of:

    A long-form instruction (complex natural language prompt). A… See the full description on the dataset page: https://huggingface.co/datasets/YCZhou/LongBench-T2I.

  9. h

    LongBench-2k

    • huggingface.co
    Updated Jul 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Z (2025). LongBench-2k [Dataset]. https://huggingface.co/datasets/figuremout/LongBench-2k
    Explore at:
    Dataset updated
    Jul 4, 2025
    Authors
    Z
    Description

    figuremout/LongBench-2k dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    longbench

    • huggingface.co
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Minghui Liu (2025). longbench [Dataset]. https://huggingface.co/datasets/minghuiliu/longbench
    Explore at:
    Dataset updated
    Jul 30, 2025
    Authors
    Minghui Liu
    Description

    minghuiliu/longbench dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    LongBench-v2-reformatted

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhiwei He (2025). LongBench-v2-reformatted [Dataset]. https://huggingface.co/datasets/zwhe99/LongBench-v2-reformatted
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Zhiwei He
    Description

    zwhe99/LongBench-v2-reformatted dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    longbench-results

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Schmidt, longbench-results [Dataset]. https://huggingface.co/datasets/xy21593/longbench-results
    Explore at:
    Authors
    Peter Schmidt
    Description

    xy21593/longbench-results dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. h

    LongBench-hotpotqa-with-evidence-label

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suyuchen Wang (2025). LongBench-hotpotqa-with-evidence-label [Dataset]. https://huggingface.co/datasets/sheryc/LongBench-hotpotqa-with-evidence-label
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Suyuchen Wang
    Description

    sheryc/LongBench-hotpotqa-with-evidence-label dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    qna-l1-synthetic

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Milos Milunovic, qna-l1-synthetic [Dataset]. https://huggingface.co/datasets/mmilunovic/qna-l1-synthetic
    Explore at:
    Authors
    Milos Milunovic
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📚 Filtered Synthetic QA Dataset (L1 Questions, LongBench-v2)

    This is a synthetic L1 QA dataset focused on simple, context-dependent questions.

    Source documents are from LongBench-v2, covering Single/Multi-Document QA across Finance, Legal, and Government domains. Documents are split into 10k-token chunks. Each chunk is passed to DeepSeek-R1, which extracts/generates multiple L1-level QA pairs that are strictly grounded in that chunk. Questions target information retrieval: facts… See the full description on the dataset page: https://huggingface.co/datasets/mmilunovic/qna-l1-synthetic.

  15. h

    ContextStretchQA

    • huggingface.co
    Updated Jun 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linus Stuhlmann (2025). ContextStretchQA [Dataset]. https://huggingface.co/datasets/slinusc/ContextStretchQA
    Explore at:
    Dataset updated
    Jun 8, 2025
    Authors
    Linus Stuhlmann
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Below is a structured, professional‐tone description of the “QA Increasing Context Length” dataset. You can use this text as a README, a data card, or incorporate it directly into documentation.

      QA Increasing Context Length Dataset
    
    
    
    
    
      1. Overview
    

    The QA Increasing Context Length dataset is designed to facilitate benchmarking and research on question‐answering (QA) systems as the size of the input context grows. It compiles QA examples drawn from multiple LongBench subsets… See the full description on the dataset page: https://huggingface.co/datasets/slinusc/ContextStretchQA.

  16. h

    LongAlign-10k

    • huggingface.co
    Updated Feb 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Z.ai (2024). LongAlign-10k [Dataset]. https://huggingface.co/datasets/zai-org/LongAlign-10k
    Explore at:
    Dataset updated
    Feb 1, 2024
    Dataset provided by
    Z.ai
    Description

    LongAlign-10k

    🤗 [LongAlign Dataset] • 💻 [Github Repo] • 📃 [LongAlign Paper]

    LongAlign is the first full recipe for LLM alignment on long context. We propose the LongAlign-10k dataset, containing 10,000 long instruction data of 8k-64k in length. We investigate on trianing strategies, namely packing (with loss weighting) and sorted batching, which are all implemented in our code. For real-world long context evaluation, we introduce LongBench-Chat that evaluate the… See the full description on the dataset page: https://huggingface.co/datasets/zai-org/LongAlign-10k.

  17. h

    Marathon

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lei Zhang, Marathon [Dataset]. https://huggingface.co/datasets/Lemoncoke/Marathon
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Lei Zhang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Marathon

      Release
    

    [2024/05/15] 🔥 Marathon is accepted by ACL 2024 Main Conference.

      Dataset Summary
    

    Marathon benchmark is a new long-context multiple-choice benchmark, mainly based on LooGLE, with some original data from LongBench. The context length can reach up to 200K+. Marathon benchmark comprises six tasks: Comprehension and Reasoning, Multiple Information Retrieval, Timeline Reorder, Computation, Passage Retrieval, and Short… See the full description on the dataset page: https://huggingface.co/datasets/Lemoncoke/Marathon.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
James Begin (2024). LongBench-v2-Pause1 [Dataset]. https://huggingface.co/datasets/JamesBegin/LongBench-v2-Pause1

LongBench-v2-Pause1

JamesBegin/LongBench-v2-Pause1

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 20, 2024
Authors
James Begin
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

🌐 Project Page: https://longbench2.github.io 💻 Github Repo: https://github.com/THUDM/LongBench 📚 Arxiv Paper: https://arxiv.org/abs/2412.15204 LongBench v2 is designed to assess the ability of LLMs to handle long-context problems requiring deep understanding and reasoning across real-world multitasks. LongBench v2 has the following features: (1) Length: Context length ranging from 8k to… See the full description on the dataset page: https://huggingface.co/datasets/JamesBegin/LongBench-v2-Pause1.

Search
Clear search
Close search
Google apps
Main menu