52 datasets found
  1. P

    BEIR Dataset

    • paperswithcode.com
    Updated Dec 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nandan Thakur; Nils Reimers; Andreas Rücklé; Abhishek Srivastava; Iryna Gurevych (2024). BEIR Dataset [Dataset]. https://paperswithcode.com/dataset/beir
    Explore at:
    Dataset updated
    Dec 8, 2024
    Authors
    Nandan Thakur; Nils Reimers; Andreas Rücklé; Abhishek Srivastava; Iryna Gurevych
    Description

    BEIR (Benchmarking IR) is a heterogeneous benchmark containing different information retrieval (IR) tasks. Through BEIR, it is possible to systematically study the zero-shot generalization capabilities of multiple neural retrieval approaches.

    The benchmark contains a total of 9 information retrieval tasks (Fact Checking, Citation Prediction, Duplicate Question Retrieval, Argument Retrieval, News Retrieval, Question Answering, Tweet Retrieval, Biomedical IR, Entity Retrieval) from 19 different datasets:

    MS MARCO TREC-COVID NFCorpus BioASQ Natural Questions HotpotQA FiQA-2018 Signal-1M TREC-News ArguAna Touche 2020 CQADupStack Quora Question Pairs DBPedia SciDocs FEVER Climate-FEVER SciFact Robust04

  2. h

    scidocs

    • huggingface.co
    Updated Aug 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BEIR (2023). scidocs [Dataset]. https://huggingface.co/datasets/BeIR/scidocs
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 16, 2023
    Dataset authored and provided by
    BEIR
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for BEIR Benchmark

      Dataset Summary
    

    BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

    Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/scidocs.

  3. h

    M-BEIR

    • huggingface.co
    Updated Dec 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TIGER-Lab (2023). M-BEIR [Dataset]. https://huggingface.co/datasets/TIGER-Lab/M-BEIR
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 7, 2023
    Dataset authored and provided by
    TIGER-Lab
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    UniIR: Training and Benchmarking Universal Multimodal Information Retrievers (ECCV 2024)

    🌐 Homepage | 🤗 Model(UniIR Checkpoints) | 🤗 Paper | 📖 arXiv | GitHub How to download the M-BEIR Dataset

      🔔News
    

    🔥[2023-12-21]: Our M-BEIR Benchmark is now available for use.

      Dataset Summary
    

    M-BEIR, the Multimodal BEnchmark for Instructed Retrieval, is a comprehensive large-scale retrieval benchmark designed to train and evaluate unified multimodal retrieval… See the full description on the dataset page: https://huggingface.co/datasets/TIGER-Lab/M-BEIR.

  4. h

    fever

    • huggingface.co
    Updated Aug 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BEIR (2023). fever [Dataset]. https://huggingface.co/datasets/BeIR/fever
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 16, 2023
    Dataset authored and provided by
    BEIR
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for BEIR Benchmark

      Dataset Summary
    

    BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

    Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/fever.

  5. h

    arguana

    • huggingface.co
    Updated Nov 21, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BEIR (2013). arguana [Dataset]. https://huggingface.co/datasets/BeIR/arguana
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 21, 2013
    Dataset authored and provided by
    BEIR
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for BEIR Benchmark

      Dataset Summary
    

    BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

    Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/arguana.

  6. P

    CoIR Dataset

    • paperswithcode.com
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiangyang Li; Kuicai Dong; Yi Quan Lee; Wei Xia; Hao Zhang; Xinyi Dai; Yasheng Wang; Ruiming Tang (2024). CoIR Dataset [Dataset]. https://paperswithcode.com/dataset/coir
    Explore at:
    Dataset updated
    Aug 20, 2024
    Authors
    Xiangyang Li; Kuicai Dong; Yi Quan Lee; Wei Xia; Hao Zhang; Xinyi Dai; Yasheng Wang; Ruiming Tang
    Description

    CoIR (Code Information Retrieval) benchmark, is designed to evaluate code retrieval capabilities. CoIR includes 10 curated code datasets, covering 8 retrieval tasks across 7 domains. In total, it encompasses two million documents. It also provides a common and easy Python framework, installable via pip, and shares the same data schema as benchmarks like MTEB and BEIR for easy cross-benchmark evaluations.

  7. O

    trec-covid

    • opendatalab.com
    • huggingface.co
    zip
    Updated Dec 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ubiquitous Knowledge Processing Lab (UKP Lab) (2023). trec-covid [Dataset]. https://opendatalab.com/OpenDataLab/trec-covid
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 29, 2023
    Dataset provided by
    Ubiquitous Knowledge Processing Lab (UKP Lab)
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks: Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet Retrieval: Signal-1M Entity Retrieval: DBPedia All these datasets have been preprocessed and can be used for your experiments.

  8. h

    trec-news-generated-queries

    • huggingface.co
    Updated Aug 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BEIR (2022). trec-news-generated-queries [Dataset]. https://huggingface.co/datasets/BeIR/trec-news-generated-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2022
    Dataset authored and provided by
    BEIR
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for BEIR Benchmark

      Dataset Summary
    

    BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

    Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/trec-news-generated-queries.

  9. h

    dbpedia-entity

    • huggingface.co
    • opendatalab.com
    Updated Aug 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BEIR (2023). dbpedia-entity [Dataset]. https://huggingface.co/datasets/BeIR/dbpedia-entity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 16, 2023
    Dataset authored and provided by
    BEIR
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for BEIR Benchmark

      Dataset Summary
    

    BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

    Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/dbpedia-entity.

  10. h

    climate-fever-top-20-gen-queries

    • huggingface.co
    Updated Mar 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INCOME (2023). climate-fever-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/climate-fever-top-20-gen-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2023
    Dataset authored and provided by
    INCOME
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NFCorpus: 20 generated queries (BEIR Benchmark)

    This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

    DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

    Below contains the old dataset card for the BEIR benchmark.

      Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/climate-fever-top-20-gen-queries.
    
  11. h

    scidocs-top-20-gen-queries

    • huggingface.co
    Updated Apr 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INCOME (2023). scidocs-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/scidocs-top-20-gen-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 1, 2023
    Dataset authored and provided by
    INCOME
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NFCorpus: 20 generated queries (BEIR Benchmark)

    This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

    DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

    Below contains the old dataset card for the BEIR benchmark.

      Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/scidocs-top-20-gen-queries.
    
  12. h

    trec-news-top-20-gen-queries

    • huggingface.co
    Updated Mar 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INCOME (2023). trec-news-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/trec-news-top-20-gen-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 13, 2023
    Dataset authored and provided by
    INCOME
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NFCorpus: 20 generated queries (BEIR Benchmark)

    This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

    DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

    Below contains the old dataset card for the BEIR benchmark.

      Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/trec-news-top-20-gen-queries.
    
  13. h

    cqadupstack-generated-queries

    • huggingface.co
    Updated Aug 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BEIR (2022). cqadupstack-generated-queries [Dataset]. https://huggingface.co/datasets/BeIR/cqadupstack-generated-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 11, 2022
    Dataset authored and provided by
    BEIR
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for BEIR Benchmark

      Dataset Summary
    

    BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

    Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/cqadupstack-generated-queries.

  14. h

    arguana-top-20-gen-queries

    • huggingface.co
    Updated Mar 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INCOME (2023). arguana-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/arguana-top-20-gen-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 8, 2023
    Dataset authored and provided by
    INCOME
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NFCorpus: 20 generated queries (BEIR Benchmark)

    This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

    DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

    Below contains the old dataset card for the BEIR benchmark.

      Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/arguana-top-20-gen-queries.
    
  15. h

    webis-touche2020

    • huggingface.co
    Updated Aug 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BEIR (2023). webis-touche2020 [Dataset]. https://huggingface.co/datasets/BeIR/webis-touche2020
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 16, 2023
    Dataset authored and provided by
    BEIR
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for BEIR Benchmark

      Dataset Summary
    

    BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

    Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/webis-touche2020.

  16. h

    cqadupstack-gaming-top-20-gen-queries

    • huggingface.co
    Updated Apr 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INCOME (2023). cqadupstack-gaming-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/cqadupstack-gaming-top-20-gen-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 1, 2023
    Dataset authored and provided by
    INCOME
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NFCorpus: 20 generated queries (BEIR Benchmark)

    This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

    DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

    Below contains the old dataset card for the BEIR benchmark.

      Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/cqadupstack-gaming-top-20-gen-queries.
    
  17. h

    webis-touche2020-top-20-gen-queries

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    webis-touche2020-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/webis-touche2020-top-20-gen-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    INCOME
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NFCorpus: 20 generated queries (BEIR Benchmark)

    This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

    DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

    Below contains the old dataset card for the BEIR benchmark.

      Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/webis-touche2020-top-20-gen-queries.
    
  18. h

    cqadupstack-physics-top-20-gen-queries

    • huggingface.co
    Updated Jan 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INCOME (2023). cqadupstack-physics-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/cqadupstack-physics-top-20-gen-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 24, 2023
    Dataset authored and provided by
    INCOME
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NFCorpus: 20 generated queries (BEIR Benchmark)

    This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

    DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

    Below contains the old dataset card for the BEIR benchmark.

      Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/cqadupstack-physics-top-20-gen-queries.
    
  19. h

    beir-nl-dbpedia-entity

    • huggingface.co
    Updated Feb 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CLiPS (2025). beir-nl-dbpedia-entity [Dataset]. https://huggingface.co/datasets/clips/beir-nl-dbpedia-entity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 10, 2025
    Dataset authored and provided by
    CLiPS
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for BEIR-NL Benchmark

      Dataset Summary
    

    BEIR-NL is a Dutch-translated version of the BEIR benchmark, a diverse and heterogeneous collection of datasets covering various domains from biomedical and financial texts to general web content. Our benchmark is integrated into the Massive Multilingual Text Embedding Benchmark (MMTEB). BEIR-NL contains the following tasks:

    Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018… See the full description on the dataset page: https://huggingface.co/datasets/clips/beir-nl-dbpedia-entity.

  20. h

    beir-nl-scifact

    • huggingface.co
    Updated Feb 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CLiPS (2025). beir-nl-scifact [Dataset]. https://huggingface.co/datasets/clips/beir-nl-scifact
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 10, 2025
    Dataset authored and provided by
    CLiPS
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for BEIR-NL Benchmark

      Dataset Summary
    

    BEIR-NL is a Dutch-translated version of the BEIR benchmark, a diverse and heterogeneous collection of datasets covering various domains from biomedical and financial texts to general web content. Our benchmark is integrated into the Massive Multilingual Text Embedding Benchmark (MMTEB). BEIR-NL contains the following tasks:

    Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018… See the full description on the dataset page: https://huggingface.co/datasets/clips/beir-nl-scifact.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nandan Thakur; Nils Reimers; Andreas Rücklé; Abhishek Srivastava; Iryna Gurevych (2024). BEIR Dataset [Dataset]. https://paperswithcode.com/dataset/beir

BEIR Dataset

Benchmarking IR

Explore at:
Dataset updated
Dec 8, 2024
Authors
Nandan Thakur; Nils Reimers; Andreas Rücklé; Abhishek Srivastava; Iryna Gurevych
Description

BEIR (Benchmarking IR) is a heterogeneous benchmark containing different information retrieval (IR) tasks. Through BEIR, it is possible to systematically study the zero-shot generalization capabilities of multiple neural retrieval approaches.

The benchmark contains a total of 9 information retrieval tasks (Fact Checking, Citation Prediction, Duplicate Question Retrieval, Argument Retrieval, News Retrieval, Question Answering, Tweet Retrieval, Biomedical IR, Entity Retrieval) from 19 different datasets:

MS MARCO TREC-COVID NFCorpus BioASQ Natural Questions HotpotQA FiQA-2018 Signal-1M TREC-News ArguAna Touche 2020 CQADupStack Quora Question Pairs DBPedia SciDocs FEVER Climate-FEVER SciFact Robust04

Search
Clear search
Close search
Google apps
Main menu