52 datasets found

P
BEIR Dataset
paperswithcode.com
Updated Dec 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nandan Thakur; Nils Reimers; Andreas Rücklé; Abhishek Srivastava; Iryna Gurevych (2024). BEIR Dataset [Dataset]. https://paperswithcode.com/dataset/beir
Explore at:
Dataset updated
Dec 8, 2024
Authors
Nandan Thakur; Nils Reimers; Andreas Rücklé; Abhishek Srivastava; Iryna Gurevych
Description
BEIR (Benchmarking IR) is a heterogeneous benchmark containing different information retrieval (IR) tasks. Through BEIR, it is possible to systematically study the zero-shot generalization capabilities of multiple neural retrieval approaches.

The benchmark contains a total of 9 information retrieval tasks (Fact Checking, Citation Prediction, Duplicate Question Retrieval, Argument Retrieval, News Retrieval, Question Answering, Tweet Retrieval, Biomedical IR, Entity Retrieval) from 19 different datasets:

MS MARCO TREC-COVID NFCorpus BioASQ Natural Questions HotpotQA FiQA-2018 Signal-1M TREC-News ArguAna Touche 2020 CQADupStack Quora Question Pairs DBPedia SciDocs FEVER Climate-FEVER SciFact Robust04
h
scidocs
huggingface.co
Updated Aug 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2023). scidocs [Dataset]. https://huggingface.co/datasets/BeIR/scidocs
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 16, 2023
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/scidocs.
h
M-BEIR
huggingface.co
Updated Dec 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TIGER-Lab (2023). M-BEIR [Dataset]. https://huggingface.co/datasets/TIGER-Lab/M-BEIR
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 7, 2023
Dataset authored and provided by
TIGER-Lab
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers (ECCV 2024)

🌐 Homepage | 🤗 Model(UniIR Checkpoints) | 🤗 Paper | 📖 arXiv | GitHub How to download the M-BEIR Dataset

🔔News

🔥[2023-12-21]: Our M-BEIR Benchmark is now available for use.

Dataset Summary

M-BEIR, the Multimodal BEnchmark for Instructed Retrieval, is a comprehensive large-scale retrieval benchmark designed to train and evaluate unified multimodal retrieval… See the full description on the dataset page: https://huggingface.co/datasets/TIGER-Lab/M-BEIR.
h
fever
huggingface.co
Updated Aug 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2023). fever [Dataset]. https://huggingface.co/datasets/BeIR/fever
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 16, 2023
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/fever.
h
arguana
huggingface.co
Updated Nov 21, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2013). arguana [Dataset]. https://huggingface.co/datasets/BeIR/arguana
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 21, 2013
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/arguana.
P
CoIR Dataset
paperswithcode.com
Updated Aug 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiangyang Li; Kuicai Dong; Yi Quan Lee; Wei Xia; Hao Zhang; Xinyi Dai; Yasheng Wang; Ruiming Tang (2024). CoIR Dataset [Dataset]. https://paperswithcode.com/dataset/coir
Explore at:
Dataset updated
Aug 20, 2024
Authors
Xiangyang Li; Kuicai Dong; Yi Quan Lee; Wei Xia; Hao Zhang; Xinyi Dai; Yasheng Wang; Ruiming Tang
Description
CoIR (Code Information Retrieval) benchmark, is designed to evaluate code retrieval capabilities. CoIR includes 10 curated code datasets, covering 8 retrieval tasks across 7 domains. In total, it encompasses two million documents. It also provides a common and easy Python framework, installable via pip, and shares the same data schema as benchmarks like MTEB and BEIR for easy cross-benchmark evaluations.
O
trec-covid
opendatalab.com
huggingface.co
zip
Updated Dec 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ubiquitous Knowledge Processing Lab (UKP Lab) (2023). trec-covid [Dataset]. https://opendatalab.com/OpenDataLab/trec-covid
Explore at:
zipAvailable download formats
Dataset updated
Dec 29, 2023
Dataset provided by
Ubiquitous Knowledge Processing Lab (UKP Lab)
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks: Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet Retrieval: Signal-1M Entity Retrieval: DBPedia All these datasets have been preprocessed and can be used for your experiments.
h
trec-news-generated-queries
huggingface.co
Updated Aug 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2022). trec-news-generated-queries [Dataset]. https://huggingface.co/datasets/BeIR/trec-news-generated-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2022
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/trec-news-generated-queries.
h
dbpedia-entity
huggingface.co
opendatalab.com
Updated Aug 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2023). dbpedia-entity [Dataset]. https://huggingface.co/datasets/BeIR/dbpedia-entity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 16, 2023
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/dbpedia-entity.
h
climate-fever-top-20-gen-queries
huggingface.co
Updated Mar 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
INCOME (2023). climate-fever-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/climate-fever-top-20-gen-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2023
Dataset authored and provided by
INCOME
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
NFCorpus: 20 generated queries (BEIR Benchmark)

This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

Below contains the old dataset card for the BEIR benchmark.

Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/climate-fever-top-20-gen-queries.
h
scidocs-top-20-gen-queries
huggingface.co
Updated Apr 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
INCOME (2023). scidocs-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/scidocs-top-20-gen-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 1, 2023
Dataset authored and provided by
INCOME
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
NFCorpus: 20 generated queries (BEIR Benchmark)

This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

Below contains the old dataset card for the BEIR benchmark.

Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/scidocs-top-20-gen-queries.
h
trec-news-top-20-gen-queries
huggingface.co
Updated Mar 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
INCOME (2023). trec-news-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/trec-news-top-20-gen-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 13, 2023
Dataset authored and provided by
INCOME
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
NFCorpus: 20 generated queries (BEIR Benchmark)

This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

Below contains the old dataset card for the BEIR benchmark.

Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/trec-news-top-20-gen-queries.
h
cqadupstack-generated-queries
huggingface.co
Updated Aug 11, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2022). cqadupstack-generated-queries [Dataset]. https://huggingface.co/datasets/BeIR/cqadupstack-generated-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 11, 2022
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/cqadupstack-generated-queries.
h
arguana-top-20-gen-queries
huggingface.co
Updated Mar 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
INCOME (2023). arguana-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/arguana-top-20-gen-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 8, 2023
Dataset authored and provided by
INCOME
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
NFCorpus: 20 generated queries (BEIR Benchmark)

This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

Below contains the old dataset card for the BEIR benchmark.

Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/arguana-top-20-gen-queries.
h
webis-touche2020
huggingface.co
Updated Aug 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2023). webis-touche2020 [Dataset]. https://huggingface.co/datasets/BeIR/webis-touche2020
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 16, 2023
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/webis-touche2020.
h
cqadupstack-gaming-top-20-gen-queries
huggingface.co
Updated Apr 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
INCOME (2023). cqadupstack-gaming-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/cqadupstack-gaming-top-20-gen-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 1, 2023
Dataset authored and provided by
INCOME
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
NFCorpus: 20 generated queries (BEIR Benchmark)

This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

Below contains the old dataset card for the BEIR benchmark.

Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/cqadupstack-gaming-top-20-gen-queries.
h
webis-touche2020-top-20-gen-queries
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
webis-touche2020-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/webis-touche2020-top-20-gen-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
INCOME
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
NFCorpus: 20 generated queries (BEIR Benchmark)

This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

Below contains the old dataset card for the BEIR benchmark.

Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/webis-touche2020-top-20-gen-queries.
h
cqadupstack-physics-top-20-gen-queries
huggingface.co
Updated Jan 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
INCOME (2023). cqadupstack-physics-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/cqadupstack-physics-top-20-gen-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 24, 2023
Dataset authored and provided by
INCOME
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
NFCorpus: 20 generated queries (BEIR Benchmark)

This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

Below contains the old dataset card for the BEIR benchmark.

Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/cqadupstack-physics-top-20-gen-queries.
h
beir-nl-dbpedia-entity
huggingface.co
Updated Feb 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CLiPS (2025). beir-nl-dbpedia-entity [Dataset]. https://huggingface.co/datasets/clips/beir-nl-dbpedia-entity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 10, 2025
Dataset authored and provided by
CLiPS
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR-NL Benchmark

Dataset Summary

BEIR-NL is a Dutch-translated version of the BEIR benchmark, a diverse and heterogeneous collection of datasets covering various domains from biomedical and financial texts to general web content. Our benchmark is integrated into the Massive Multilingual Text Embedding Benchmark (MMTEB). BEIR-NL contains the following tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018… See the full description on the dataset page: https://huggingface.co/datasets/clips/beir-nl-dbpedia-entity.
h
beir-nl-scifact
huggingface.co
Updated Feb 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CLiPS (2025). beir-nl-scifact [Dataset]. https://huggingface.co/datasets/clips/beir-nl-scifact
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 10, 2025
Dataset authored and provided by
CLiPS
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR-NL Benchmark

Dataset Summary

BEIR-NL is a Dutch-translated version of the BEIR benchmark, a diverse and heterogeneous collection of datasets covering various domains from biomedical and financial texts to general web content. Our benchmark is integrated into the Massive Multilingual Text Embedding Benchmark (MMTEB). BEIR-NL contains the following tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018… See the full description on the dataset page: https://huggingface.co/datasets/clips/beir-nl-scifact.

Facebook

Twitter

Click to copy link

Link copied

Cite

Nandan Thakur; Nils Reimers; Andreas Rücklé; Abhishek Srivastava; Iryna Gurevych (2024). BEIR Dataset [Dataset]. https://paperswithcode.com/dataset/beir

BEIR Dataset

Benchmarking IR

Explore at:

Dataset updated

Dec 8, 2024

Authors

Nandan Thakur; Nils Reimers; Andreas Rücklé; Abhishek Srivastava; Iryna Gurevych

Description

BEIR (Benchmarking IR) is a heterogeneous benchmark containing different information retrieval (IR) tasks. Through BEIR, it is possible to systematically study the zero-shot generalization capabilities of multiple neural retrieval approaches.

The benchmark contains a total of 9 information retrieval tasks (Fact Checking, Citation Prediction, Duplicate Question Retrieval, Argument Retrieval, News Retrieval, Question Answering, Tweet Retrieval, Biomedical IR, Entity Retrieval) from 19 different datasets:

MS MARCO TREC-COVID NFCorpus BioASQ Natural Questions HotpotQA FiQA-2018 Signal-1M TREC-News ArguAna Touche 2020 CQADupStack Quora Question Pairs DBPedia SciDocs FEVER Climate-FEVER SciFact Robust04

Clear search

Close search

Google apps

Main menu

BEIR Dataset

scidocs

M-BEIR

fever

arguana

CoIR Dataset

trec-covid

trec-news-generated-queries

dbpedia-entity

climate-fever-top-20-gen-queries

scidocs-top-20-gen-queries

trec-news-top-20-gen-queries

cqadupstack-generated-queries

arguana-top-20-gen-queries

webis-touche2020

cqadupstack-gaming-top-20-gen-queries

webis-touche2020-top-20-gen-queries

cqadupstack-physics-top-20-gen-queries

beir-nl-dbpedia-entity

beir-nl-scifact

BEIR DatasetSee More Versions

Benchmarking IR

BEIR Dataset