https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/
ποΈ DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation
π Project Website | π Read our paper
Updates π
2025-06-11: Added Llama 70B evaluations with ~5,700 MMLU examples across 100 different prompt variations (= 570K new predictions!), based on data from ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments 2025-04-12: Added MMLU predictions from dozens of models including OpenAI, Qwen, Mistral, Gemini⦠See the full description on the dataset page: https://huggingface.co/datasets/nlphuji/DOVE_Lite.
https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/
ποΈ DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation
π Project Website | π Read our paper
Updates π
2025-06-11: Added Llama 70B evaluations with ~5,700 MMLU examples across 100 different prompt variations (= 570K new predictions!), based on data from ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments 2025-04-12: Added MMLU predictions from dozens of models including OpenAI, Qwen, Mistral, Gemini⦠See the full description on the dataset page: https://huggingface.co/datasets/nlphuji/DOVE.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/
ποΈ DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation
π Project Website | π Read our paper
Updates π
2025-06-11: Added Llama 70B evaluations with ~5,700 MMLU examples across 100 different prompt variations (= 570K new predictions!), based on data from ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments 2025-04-12: Added MMLU predictions from dozens of models including OpenAI, Qwen, Mistral, Gemini⦠See the full description on the dataset page: https://huggingface.co/datasets/nlphuji/DOVE_Lite.