46 datasets found
  1. h

    scifact

    • huggingface.co
    Updated Aug 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BEIR (2023). scifact [Dataset]. https://huggingface.co/datasets/BeIR/scifact
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 16, 2023
    Dataset authored and provided by
    BEIR
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for BEIR Benchmark

      Dataset Summary
    

    BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

    Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweetโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/BeIR/scifact.

  2. h

    scifact

    • huggingface.co
    Updated Jul 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). scifact [Dataset]. https://huggingface.co/datasets/mteb/scifact
    Explore at:
    Dataset updated
    Jul 7, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    SciFact An MTEB dataset Massive Text Embedding Benchmark

    SciFact verifies scientific claims using evidence from the research literature containing scientific paper abstracts.

    Task category t2t

    Domains Academic, Medical, Written

    Reference https://github.com/allenai/scifact

      How to evaluate on this task
    

    You can evaluate an embedding model on this dataset using the following code: import mteb

    task = mteb.get_tasks(["SciFact"]) evaluator = mteb.MTEB(task)โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/mteb/scifact.

  3. h

    scifact

    • huggingface.co
    Updated Apr 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigScience Biomedical Datasets (2023). scifact [Dataset]. https://huggingface.co/datasets/bigbio/scifact
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 20, 2023
    Dataset authored and provided by
    BigScience Biomedical Datasets
    License

    Attribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
    License information was derived automatically

    Description

    {_DESCRIPTION_BASE} This config connects the claims to the evidence and doc ids.

  4. h

    scifact

    • huggingface.co
    Updated Apr 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pavel Shkunov (2024). scifact [Dataset]. https://huggingface.co/datasets/pa-shk/scifact
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 16, 2024
    Authors
    Pavel Shkunov
    Description

    pa-shk/scifact dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    scifact

    • huggingface.co
    Updated May 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cocktail (2024). scifact [Dataset]. https://huggingface.co/datasets/IR-Cocktail/scifact
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 28, 2024
    Dataset authored and provided by
    Cocktail
    Description

    Data Description

    Homepage: https://github.com/KID-22/Cocktail Repository: https://github.com/KID-22/Cocktail Paper: [Needs More Information]

      Dataset Summary
    

    All the 16 benchmarked datasets in Cocktail are listed in the following table.

    Dataset Raw Website Cocktail Website Cocktail-Name md5 for Processed Data Domain Relevancy

    Test Query

    Corpus

    MS MARCO Homepage Homepage msmarco 985926f3e906fadf0dc6249f23ed850f Misc. Binary 6,979 542,203

    DL19โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/IR-Cocktail/scifact.

  6. h

    scifact.pisa

    • huggingface.co
    Updated Oct 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PyTerrier (2024). scifact.pisa [Dataset]. https://huggingface.co/datasets/pyterrier/scifact.pisa
    Explore at:
    Dataset updated
    Oct 8, 2024
    Dataset authored and provided by
    PyTerrier
    Description

    scifact.pisa

      Description
    

    A PISA index for the SciFact dataset

      Usage
    

    Load the artifact

    import pyterrier as pt index = pt.Artifact.from_hf('pyterrier/scifact.pisa') index.bm25() # returns a BM25 retriever

      Benchmarks
    

    name nDCG@10 R@1000

    bm25 0.6776 0.9733

    dph 0.6735 0.97

      Reproduction
    

    import pyterrier as pt from tqdm import tqdm import ir_datasets from pyterrier_pisa import PisaIndex index = PisaIndex("scifact.pisa"โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/pyterrier/scifact.pisa.

  7. h

    dummy-scifact

    • huggingface.co
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khurana (2025). dummy-scifact [Dataset]. https://huggingface.co/datasets/gurnoor-ctx/dummy-scifact
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2025
    Authors
    Khurana
    Description

    gurnoor-ctx/dummy-scifact dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    scifact-open

    • huggingface.co
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UMBC SciFy Team (2025). scifact-open [Dataset]. https://huggingface.co/datasets/umbc-scify/scifact-open
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset authored and provided by
    UMBC SciFy Team
    Description

    Data Stats

    206 claims 500k distractors

      Data Structure
    
    
    
    
    
      Test
    

    claim evidence: GT evidence evidence_id: GT evidence id label: GT label evidences: list of all evidences evidence_ids: list of all evidence ids labels: list of all labels

      Distractors
    

    evidence evidence_id

      Process Code
    

    import pandas as pd

    from datasets import Dataset

    claims = pd.read_csv("./scifact_open_retriever_test.csv") claims.head()

    docs =โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/umbc-scify/scifact-open.

  9. h

    scifact-vn

    • huggingface.co
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GreenNode.ai (2025). scifact-vn [Dataset]. https://huggingface.co/datasets/GreenNode/scifact-vn
    Explore at:
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    GreenNode.ai
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    How to evaluate on this task

    You can evaluate an embedding model on this dataset using the following code: import mteb

    task = mteb.get_tasks(["Scifact-VN"]) evaluator = mteb.MTEB(task)

    model = mteb.get_model(YOUR_MODEL) evaluator.run(model)

    To learn more about how to run models on mteb task check out the GitHub repitory.

      Citation
    

    If you use this dataset, please cite the dataset as well as mteb, as this dataset likely includes additional processing as a part ofโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/GreenNode/scifact-vn.

  10. h

    scifact-fa-v2

    • huggingface.co
    Updated Jul 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MCINext (2025). scifact-fa-v2 [Dataset]. https://huggingface.co/datasets/MCINext/scifact-fa-v2
    Explore at:
    Dataset updated
    Jul 27, 2025
    Dataset authored and provided by
    MCINext
    Description

    MCINext/scifact-fa-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    scifact-tr

    • huggingface.co
    Updated May 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turkish Massive Text Embedding Benchmark (2025). scifact-tr [Dataset]. https://huggingface.co/datasets/trmteb/scifact-tr
    Explore at:
    Dataset updated
    May 29, 2025
    Dataset authored and provided by
    Turkish Massive Text Embedding Benchmark
    Description

    trmteb/scifact-tr dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    SciFact

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francielle Vargas, SciFact [Dataset]. https://huggingface.co/datasets/franciellevargas/SciFact
    Explore at:
    Authors
    Francielle Vargas
    Description

    franciellevargas/SciFact dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. h

    rus-scifact

    • huggingface.co
    Updated Feb 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grigory Kovalev (2025). rus-scifact [Dataset]. https://huggingface.co/datasets/kaengreg/rus-scifact
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 20, 2025
    Authors
    Grigory Kovalev
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    kaengreg/rus-scifact dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    splade-scifact-train-retrievals

    • huggingface.co
    Updated Apr 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jasper Xian (2024). splade-scifact-train-retrievals [Dataset]. https://huggingface.co/datasets/jasper-xian/splade-scifact-train-retrievals
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2024
    Authors
    Jasper Xian
    Description

    jasper-xian/splade-scifact-train-retrievals dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    beir_scifact

    • huggingface.co
    Updated Aug 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ir-datasets (2023). beir_scifact [Dataset]. https://huggingface.co/datasets/irds/beir_scifact
    Explore at:
    Dataset updated
    Aug 5, 2023
    Dataset authored and provided by
    ir-datasets
    Description

    Dataset Card for beir/scifact

    The beir/scifact dataset, provided by the ir-datasets package. For more information about the dataset, see the documentation.

      Data
    

    This dataset provides:

    docs (documents, i.e., the corpus); count=5,183 queries (i.e., topics); count=1,109

    This dataset is used by: beir_scifact_test, beir_scifact_train

      Usage
    

    from datasets import load_dataset

    docs = load_dataset('irds/beir_scifact', 'docs') for record in docs: record #โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/irds/beir_scifact.

  16. h

    gpl-scifact

    • huggingface.co
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nandan Thakur (2025). gpl-scifact [Dataset]. https://huggingface.co/datasets/nthakur/gpl-scifact
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2025
    Authors
    Nandan Thakur
    Description

    nthakur/gpl-scifact dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    scifact.splade-v3.cache

    • huggingface.co
    Updated Oct 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PyTerrier (2024). scifact.splade-v3.cache [Dataset]. https://huggingface.co/datasets/pyterrier/scifact.splade-v3.cache
    Explore at:
    Dataset updated
    Oct 15, 2024
    Dataset authored and provided by
    PyTerrier
    Description

    scifact.splade-v3.cache

      Description
    

    TODO: What is the artifact?

      Usage
    

    Load the artifact

    import pyterrier_alpha as pta artifact = pta.Artifact.from_hf('pyterrier/scifact.splade-v3.cache')

    TODO: Show how you use the artifact

      Benchmarks
    

    TODO: Provide benchmarks for the artifact.

      Reproduction
    

    TODO: Show how you constructed the artifact.

      Metadata
    

    { "type": "indexer_cache", "format": "lz4pickle", "package_hint":โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/pyterrier/scifact.splade-v3.cache.

  18. h

    beir-nl-scifact

    • huggingface.co
    Updated Feb 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CLiPS (2025). beir-nl-scifact [Dataset]. https://huggingface.co/datasets/clips/beir-nl-scifact
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 10, 2025
    Dataset authored and provided by
    CLiPS
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for BEIR-NL Benchmark

      Dataset Summary
    

    BEIR-NL is a Dutch-translated version of the BEIR benchmark, a diverse and heterogeneous collection of datasets covering various domains from biomedical and financial texts to general web content. Our benchmark is integrated into the Massive Multilingual Text Embedding Benchmark (MMTEB). BEIR-NL contains the following tasks:

    Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/clips/beir-nl-scifact.

  19. h

    scifact-top-20-gen-queries

    • huggingface.co
    Updated Apr 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INCOME (2023). scifact-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/scifact-top-20-gen-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 1, 2023
    Dataset authored and provided by
    INCOME
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NFCorpus: 20 generated queries (BEIR Benchmark)

    This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

    DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

    Below contains the old dataset card for the BEIR benchmark.

      Dataset Card for BEIRโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/income/scifact-top-20-gen-queries.
    
  20. h

    sheared-llama-scifact-results-new

    • huggingface.co
    Updated Dec 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vaibhav Adlakha (2024). sheared-llama-scifact-results-new [Dataset]. https://huggingface.co/datasets/vaibhavad/sheared-llama-scifact-results-new
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 21, 2024
    Authors
    Vaibhav Adlakha
    Description

    vaibhavad/sheared-llama-scifact-results-new dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
BEIR (2023). scifact [Dataset]. https://huggingface.co/datasets/BeIR/scifact

scifact

BeIR/scifact

BEIR Benchmark

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 16, 2023
Dataset authored and provided by
BEIR
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset Card for BEIR Benchmark

  Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweetโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/BeIR/scifact.

Search
Clear search
Close search
Google apps
Main menu