2 datasets found
  1. h

    DOVE_Lite

    • huggingface.co
    Updated Mar 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nlphuji (2025). DOVE_Lite [Dataset]. https://huggingface.co/datasets/nlphuji/DOVE_Lite
    Explore at:
    Dataset updated
    Mar 2, 2025
    Dataset authored and provided by
    nlphuji
    License

    https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/

    Description

    πŸ•ŠοΈ DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation

    🌐 Project Website | πŸ“„ Read our paper

      Updates πŸ“…
    

    2025-06-11: Added Llama 70B evaluations with ~5,700 MMLU examples across 100 different prompt variations (= 570K new predictions!), based on data from ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments 2025-04-12: Added MMLU predictions from dozens of models including OpenAI, Qwen, Mistral, Gemini… See the full description on the dataset page: https://huggingface.co/datasets/nlphuji/DOVE_Lite.

  2. h

    DOVE

    • huggingface.co
    Updated Mar 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nlphuji (2025). DOVE [Dataset]. https://huggingface.co/datasets/nlphuji/DOVE
    Explore at:
    Dataset updated
    Mar 2, 2025
    Dataset authored and provided by
    nlphuji
    License

    https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/

    Description

    πŸ•ŠοΈ DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation

    🌐 Project Website | πŸ“„ Read our paper

      Updates πŸ“…
    

    2025-06-11: Added Llama 70B evaluations with ~5,700 MMLU examples across 100 different prompt variations (= 570K new predictions!), based on data from ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments 2025-04-12: Added MMLU predictions from dozens of models including OpenAI, Qwen, Mistral, Gemini… See the full description on the dataset page: https://huggingface.co/datasets/nlphuji/DOVE.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
nlphuji (2025). DOVE_Lite [Dataset]. https://huggingface.co/datasets/nlphuji/DOVE_Lite

DOVE_Lite

nlphuji/DOVE_Lite

DOVE: A Multi-Dimensional Predictions Dataset for LLM Evaluation

Explore at:
Dataset updated
Mar 2, 2025
Dataset authored and provided by
nlphuji
License

https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/

Description

πŸ•ŠοΈ DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation

🌐 Project Website | πŸ“„ Read our paper

  Updates πŸ“…

2025-06-11: Added Llama 70B evaluations with ~5,700 MMLU examples across 100 different prompt variations (= 570K new predictions!), based on data from ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments 2025-04-12: Added MMLU predictions from dozens of models including OpenAI, Qwen, Mistral, Gemini… See the full description on the dataset page: https://huggingface.co/datasets/nlphuji/DOVE_Lite.

Search
Clear search
Close search
Google apps
Main menu