2 datasets found

h
DOVE_Lite
huggingface.co
Updated Mar 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nlphuji (2025). DOVE_Lite [Dataset]. https://huggingface.co/datasets/nlphuji/DOVE_Lite
Explore at:
Dataset updated
Mar 2, 2025
Dataset authored and provided by
nlphuji
License
https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/
Description
🕊️ DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation

🌐 Project Website | 📄 Read our paper

Updates 📅

2025-06-11: Added Llama 70B evaluations with ~5,700 MMLU examples across 100 different prompt variations (= 570K new predictions!), based on data from ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments 2025-04-12: Added MMLU predictions from dozens of models including OpenAI, Qwen, Mistral, Gemini… See the full description on the dataset page: https://huggingface.co/datasets/nlphuji/DOVE_Lite.
h
DOVE
huggingface.co
Updated Mar 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nlphuji (2025). DOVE [Dataset]. https://huggingface.co/datasets/nlphuji/DOVE
Explore at:
Dataset updated
Mar 2, 2025
Dataset authored and provided by
nlphuji
License
https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/
Description
🕊️ DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation

🌐 Project Website | 📄 Read our paper

Updates 📅

2025-06-11: Added Llama 70B evaluations with ~5,700 MMLU examples across 100 different prompt variations (= 570K new predictions!), based on data from ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments 2025-04-12: Added MMLU predictions from dozens of models including OpenAI, Qwen, Mistral, Gemini… See the full description on the dataset page: https://huggingface.co/datasets/nlphuji/DOVE.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

nlphuji (2025). DOVE_Lite [Dataset]. https://huggingface.co/datasets/nlphuji/DOVE_Lite

DOVE_Lite

nlphuji/DOVE_Lite

DOVE: A Multi-Dimensional Predictions Dataset for LLM Evaluation

Explore at:

Dataset updated

Mar 2, 2025

Dataset authored and provided by

nlphuji

License

https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/

Description

🕊️ DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation

🌐 Project Website | 📄 Read our paper

  Updates 📅

2025-06-11: Added Llama 70B evaluations with ~5,700 MMLU examples across 100 different prompt variations (= 570K new predictions!), based on data from ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments 2025-04-12: Added MMLU predictions from dozens of models including OpenAI, Qwen, Mistral, Gemini… See the full description on the dataset page: https://huggingface.co/datasets/nlphuji/DOVE_Lite.

Clear search

Close search

Google apps

Main menu