MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
SimpleQA
A factuality benchmark called SimpleQA that measures the ability for language models to answer short, fact-seeking questions.
Sources
openai/simple-evals Introducing SimpleQA Measuring short-form factuality in large language models
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
SimpleQuestions is a dataset for simple QA, which consists of a total of 108,442 questions written in natural language by human English-speaking annotators each paired with a corresponding fact, formatted as (subject, relationship, object), that provides the answer but also a complete explanation. Fast have been extracted from the Knowledge Base Freebase (freebase.com). We randomly shuffle these questions and use 70% of them (75910) as training set, 10% as validation set (10845), and the remaining 20% as test set.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
oidlabs/simpleQA dataset hosted on Hugging Face and contributed by the HF Datasets community
cmriat/simpleqa dataset hosted on Hugging Face and contributed by the HF Datasets community
andreuka18/SimpleQA-1000 dataset hosted on Hugging Face and contributed by the HF Datasets community
hamishivi/SimpleQA-RLVR-noprompt dataset hosted on Hugging Face and contributed by the HF Datasets community
ringos/output_Llama-3.1-8B-simpleqa-0_1000-m_generation-n_128-t_1.0-k_50-p_0.95-l_128 dataset hosted on Hugging Face and contributed by the HF Datasets community
SimpleQuestions is a large-scale factoid question answering dataset. It consists of 108,442 natural language questions, each paired with a corresponding fact from Freebase knowledge base. Each fact is a triple (subject, relation, object) and the answer to the question is always the object. The dataset is divided into training, validation, and test sets with 75,910, 10,845 and 21,687 questions respectively.
Lambent/synthetic-rag-simple-qa-4th-to-6th dataset hosted on Hugging Face and contributed by the HF Datasets community
ringos/output_Mistral-Nemo-Base-2407-simpleqa-0_1000-m_generation-n_32-t_1.0-k_40-p_0.9-l_128 dataset hosted on Hugging Face and contributed by the HF Datasets community
Synthetic oracle datastore for SimpleQA. The oracle document is generated based on the problem and answer. This data is generated by Llama3.3-70B-Instruct. template = f""" You are a helpful assistant that can synthesize a Wikipedia document from a question and an answer. The document should be an actual Wikipedia article that can be helpful for answering the question. Do not directly include the question in the document. The document should contain around 150 words.
Question: {{question}} โฆ See the full description on the dataset page: https://huggingface.co/datasets/rulins/SimpleQA-synthetic-datastore-Llama3.3-70B-Instruct.
Lambent/synthetic-rag-hermes-simple-qa-1st-ic dataset hosted on Hugging Face and contributed by the HF Datasets community
piotr-rybak/simple-qa dataset hosted on Hugging Face and contributed by the HF Datasets community
Model Card: SimpleQA Benchmark
Information from OpenAI blogpost Model Card for SimpleQAVersion: v1.0Date: October 30, 2024Authors: Jason Wei, Karina Nguyen, Hyung Won Chung, Joy Jiao, Spencer Papay, Mia Glaese, John Schulman, Liam FedusAcknowledgements: Adam Tauman Kalai
Model Overview
SimpleQA is a factuality benchmark designed to evaluate the accuracy and reliability of language models in responding to short, fact-seeking questions. Aimed at assessing models'โฆ See the full description on the dataset page: https://huggingface.co/datasets/MAISAAI/openai_simple_qa_test_set.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
ACG-SimpleQA
๐ Website โข ๐ค Hugging Face
ไธญๆ | English
ACG-SimpleQA is an objective knowledge question-answering dataset focused on the Chinese ACG (Animation, Comic, Game) domain, containing 4242 auto-generated carefully designed QA samples. This benchmark aims to evaluate large language models' factual capabilities in the ACG culture domain, featuring Chinese language, diversity, high quality, static answers, and easy evaluation.
๐ข Latest Updatesโฆ See the full description on the dataset page: https://huggingface.co/datasets/Papersnake/ACG-SimpleQA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ArXiv QA
(TBD) Automated ArXiv question answering via large language models Github | Homepage | Simple QA - Hugging Face Space
Automated Question Answering with ArXiv Papers
Latest 25 Papers
LIME: Localized Image Editing via Attention Regularization in Diffusion Models - [Arxiv] [QA]
Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization - [Arxiv] [QA]
VL-GPT: A Generative Pre-trained Transformer for Vision andโฆ See the full description on the dataset page: https://huggingface.co/datasets/taesiri/arxiv_qa.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
kesitt/Wikipedia-Turkish-SimpleQA dataset hosted on Hugging Face and contributed by the HF Datasets community
Together-Search-Bench Dataset
This dataset is used for delopment and evaluations for Together Open Deep Research. The data is composed of 50 samples from each of the following datasets: simpleqa: basicv8vc/SimpleQA frames: google/frames-benchmark hotpotqa: hotpotqa/hotpot_qa
License Information
Part of the data derived from hotpotqa by Yang et al., licensed under CC BY-SA 4.0. Modifications include the full dataset subsampling. Part of the data derived from frames byโฆ See the full description on the dataset page: https://huggingface.co/datasets/togethercomputer/together-search-bench.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
SimpleQA
A factuality benchmark called SimpleQA that measures the ability for language models to answer short, fact-seeking questions.
Sources
openai/simple-evals Introducing SimpleQA Measuring short-form factuality in large language models