19 datasets found

h
M-BEIR
huggingface.co
Updated Dec 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TIGER-Lab (2023). M-BEIR [Dataset]. https://huggingface.co/datasets/TIGER-Lab/M-BEIR
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 7, 2023
Dataset authored and provided by
TIGER-Lab
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers (ECCV 2024)

🌐 Homepage | 🤗 Model(UniIR Checkpoints) | 🤗 Paper | 📖 arXiv | GitHub How to download the M-BEIR Dataset

🔔News

🔥[2023-12-21]: Our M-BEIR Benchmark is now available for use.

Dataset Summary

M-BEIR, the Multimodal BEnchmark for Instructed Retrieval, is a comprehensive large-scale retrieval benchmark designed to train and evaluate unified multimodal retrieval… See the full description on the dataset page: https://huggingface.co/datasets/TIGER-Lab/M-BEIR.
h
M-BEIR_DEV
huggingface.co
Updated Jun 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Multimodal Benchmarking IR (2022). M-BEIR_DEV [Dataset]. https://huggingface.co/datasets/MBEIR/M-BEIR_DEV
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 24, 2022
Dataset authored and provided by
Multimodal Benchmarking IR
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
MBEIR/M-BEIR_DEV dataset hosted on Hugging Face and contributed by the HF Datasets community
h
hotpotqa
huggingface.co
Updated Aug 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2022). hotpotqa [Dataset]. https://huggingface.co/datasets/BeIR/hotpotqa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 24, 2022
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/hotpotqa.
h
prebuilt-indexes-mbeir
huggingface.co
Updated Aug 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Castorini (2025). prebuilt-indexes-mbeir [Dataset]. https://huggingface.co/datasets/castorini/prebuilt-indexes-mbeir
Explore at:
Dataset updated
Aug 14, 2025
Dataset authored and provided by
Castorini
Description
castorini/prebuilt-indexes-mbeir dataset hosted on Hugging Face and contributed by the HF Datasets community
h
trec-news-generated-queries
huggingface.co
Updated Aug 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2022). trec-news-generated-queries [Dataset]. https://huggingface.co/datasets/BeIR/trec-news-generated-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2022
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/trec-news-generated-queries.
h
fever
huggingface.co
Updated Aug 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2023). fever [Dataset]. https://huggingface.co/datasets/BeIR/fever
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 16, 2023
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/fever.
h
climate-fever
huggingface.co
Updated Aug 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2023). climate-fever [Dataset]. https://huggingface.co/datasets/BeIR/climate-fever
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 16, 2023
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/climate-fever.
h
cqadupstack-generated-queries
huggingface.co
Updated Aug 11, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2022). cqadupstack-generated-queries [Dataset]. https://huggingface.co/datasets/BeIR/cqadupstack-generated-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 11, 2022
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/cqadupstack-generated-queries.
h
dbpedia-entity
huggingface.co
opendatalab.com
Updated Aug 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2023). dbpedia-entity [Dataset]. https://huggingface.co/datasets/BeIR/dbpedia-entity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 16, 2023
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/dbpedia-entity.
h
beir-nl-nq
huggingface.co
Updated Feb 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CLiPS (2025). beir-nl-nq [Dataset]. https://huggingface.co/datasets/clips/beir-nl-nq
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 10, 2025
Dataset authored and provided by
CLiPS
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR-NL Benchmark

Dataset Summary

BEIR-NL is a Dutch-translated version of the BEIR benchmark, a diverse and heterogeneous collection of datasets covering various domains from biomedical and financial texts to general web content. Our benchmark is integrated into the Massive Multilingual Text Embedding Benchmark (MMTEB). BEIR-NL contains the following tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018… See the full description on the dataset page: https://huggingface.co/datasets/clips/beir-nl-nq.
h
mbeir-fashion-passage
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Norman, mbeir-fashion-passage [Dataset]. https://huggingface.co/datasets/michael-norman/mbeir-fashion-passage
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Michael Norman
Description
michael-norman/mbeir-fashion-passage dataset hosted on Hugging Face and contributed by the HF Datasets community
beir-embed-english-v3
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cohere, beir-embed-english-v3 [Dataset]. https://huggingface.co/datasets/Cohere/beir-embed-english-v3
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Coherehttps://cohere.com/
Description
BEIR embeddings with Cohere embed-english-v3.0 model

This datasets contains all query & document embeddings for BEIR, embedded with the Cohere embed-english-v3.0 embedding model.

Overview of datasets

This repository hosts all 18 datasets from BEIR, including query and document embeddings. The following table gives an overview of the available datasets. See the next section how to load the individual datasets.

Dataset nDCG@10

Documents

arguana 53.98 8,674… See the full description on the dataset page: https://huggingface.co/datasets/Cohere/beir-embed-english-v3.
h
webis-touche2020-generated-queries
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR, webis-touche2020-generated-queries [Dataset]. https://huggingface.co/datasets/BeIR/webis-touche2020-generated-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/webis-touche2020-generated-queries.
h
beir-nl-hotpotqa
huggingface.co
Updated Feb 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CLiPS (2025). beir-nl-hotpotqa [Dataset]. https://huggingface.co/datasets/clips/beir-nl-hotpotqa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 10, 2025
Dataset authored and provided by
CLiPS
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR-NL Benchmark

Dataset Summary

BEIR-NL is a Dutch-translated version of the BEIR benchmark, a diverse and heterogeneous collection of datasets covering various domains from biomedical and financial texts to general web content. Our benchmark is integrated into the Massive Multilingual Text Embedding Benchmark (MMTEB). BEIR-NL contains the following tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018… See the full description on the dataset page: https://huggingface.co/datasets/clips/beir-nl-hotpotqa.
h
beir-nl-climate-fever
huggingface.co
Updated Feb 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CLiPS (2025). beir-nl-climate-fever [Dataset]. https://huggingface.co/datasets/clips/beir-nl-climate-fever
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 10, 2025
Dataset authored and provided by
CLiPS
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR-NL Benchmark

Dataset Summary

BEIR-NL is a Dutch-translated version of the BEIR benchmark, a diverse and heterogeneous collection of datasets covering various domains from biomedical and financial texts to general web content. Our benchmark is integrated into the Massive Multilingual Text Embedding Benchmark (MMTEB). BEIR-NL contains the following tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018… See the full description on the dataset page: https://huggingface.co/datasets/clips/beir-nl-climate-fever.
h
arxiv-beir-500k-generated-queries
huggingface.co
Updated Sep 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Algorithmic Research Group (2024). arxiv-beir-500k-generated-queries [Dataset]. https://huggingface.co/datasets/AlgorithmicResearchGroup/arxiv-beir-500k-generated-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 5, 2024
Dataset authored and provided by
Algorithmic Research Group
Description
Dataset Summary

A BEIR style dataset derived from ArXiv

Languages

All tasks are in English (en).

Dataset Structure

The dataset contains a corpus, queries and qrels (relevance judgments file). They must be in the following format:

corpus file: a .jsonl file (jsonlines) that contains a list of dictionaries, each with three fields _id with unique document identifier, title with document title (optional) and text with document paragraph or passage. For example:… See the full description on the dataset page: https://huggingface.co/datasets/AlgorithmicResearchGroup/arxiv-beir-500k-generated-queries.
h
bioasq-top-20-gen-queries
huggingface.co
Updated Mar 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
INCOME (2023). bioasq-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/bioasq-top-20-gen-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 7, 2023
Dataset authored and provided by
INCOME
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
NFCorpus: 20 generated queries (BEIR Benchmark)

This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

Below contains the old dataset card for the BEIR benchmark.

Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/bioasq-top-20-gen-queries.
h
fever-top-20-gen-queries
huggingface.co
Updated Mar 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
INCOME (2023). fever-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/fever-top-20-gen-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2023
Dataset authored and provided by
INCOME
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
NFCorpus: 20 generated queries (BEIR Benchmark)

This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

Below contains the old dataset card for the BEIR benchmark.

Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/fever-top-20-gen-queries.
h
fever_ft
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sep Zeighami, fever_ft [Dataset]. https://huggingface.co/datasets/sepz/fever_ft
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Sep Zeighami
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The dataset contains a random 0.7/0.1/0.2 train/dev/test splits of fever dataset from BEIR https://github.com/beir-cellar/beir for benchmarking embedding model fine-tuning.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

TIGER-Lab (2023). M-BEIR [Dataset]. https://huggingface.co/datasets/TIGER-Lab/M-BEIR

M-BEIR

TIGER-Lab/M-BEIR

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 7, 2023

Dataset authored and provided by

TIGER-Lab

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers (ECCV 2024)

🌐 Homepage | 🤗 Model(UniIR Checkpoints) | 🤗 Paper | 📖 arXiv | GitHub How to download the M-BEIR Dataset

  🔔News

🔥[2023-12-21]: Our M-BEIR Benchmark is now available for use.

  Dataset Summary

M-BEIR, the Multimodal BEnchmark for Instructed Retrieval, is a comprehensive large-scale retrieval benchmark designed to train and evaluate unified multimodal retrieval… See the full description on the dataset page: https://huggingface.co/datasets/TIGER-Lab/M-BEIR.

Clear search

Close search

Google apps

Main menu

M-BEIR

M-BEIR_DEV

hotpotqa

prebuilt-indexes-mbeir

trec-news-generated-queries

fever

climate-fever

cqadupstack-generated-queries

dbpedia-entity

beir-nl-nq

mbeir-fashion-passage

beir-embed-english-v3

Documents

webis-touche2020-generated-queries

beir-nl-hotpotqa

beir-nl-climate-fever

arxiv-beir-500k-generated-queries

bioasq-top-20-gen-queries

fever-top-20-gen-queries

fever_ft

M-BEIR

M-BEIR

TIGER-Lab/M-BEIR