46 datasets found

h
scifact
huggingface.co
Updated Aug 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2023). scifact [Dataset]. https://huggingface.co/datasets/BeIR/scifact
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 16, 2023
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/scifact.
h
scifact
huggingface.co
Updated Jul 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). scifact [Dataset]. https://huggingface.co/datasets/mteb/scifact
Explore at:
Dataset updated
Jul 7, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
SciFact An MTEB dataset Massive Text Embedding Benchmark

SciFact verifies scientific claims using evidence from the research literature containing scientific paper abstracts.

Task category t2t

Domains Academic, Medical, Written

Reference https://github.com/allenai/scifact

How to evaluate on this task

You can evaluate an embedding model on this dataset using the following code: import mteb

task = mteb.get_tasks(["SciFact"]) evaluator = mteb.MTEB(task)… See the full description on the dataset page: https://huggingface.co/datasets/mteb/scifact.
h
scifact
huggingface.co
Updated Apr 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BigScience Biomedical Datasets (2023). scifact [Dataset]. https://huggingface.co/datasets/bigbio/scifact
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2023
Dataset authored and provided by
BigScience Biomedical Datasets
License
Attribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
License information was derived automatically
Description
{_DESCRIPTION_BASE} This config connects the claims to the evidence and doc ids.
h
scifact
huggingface.co
Updated Apr 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pavel Shkunov (2024). scifact [Dataset]. https://huggingface.co/datasets/pa-shk/scifact
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 16, 2024
Authors
Pavel Shkunov
Description
pa-shk/scifact dataset hosted on Hugging Face and contributed by the HF Datasets community
h
scifact
huggingface.co
Updated May 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cocktail (2024). scifact [Dataset]. https://huggingface.co/datasets/IR-Cocktail/scifact
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 28, 2024
Dataset authored and provided by
Cocktail
Description
Data Description

Homepage: https://github.com/KID-22/Cocktail Repository: https://github.com/KID-22/Cocktail Paper: [Needs More Information]

Dataset Summary

All the 16 benchmarked datasets in Cocktail are listed in the following table.

Dataset Raw Website Cocktail Website Cocktail-Name md5 for Processed Data Domain Relevancy

Test Query

Corpus

MS MARCO Homepage Homepage msmarco 985926f3e906fadf0dc6249f23ed850f Misc. Binary 6,979 542,203

DL19… See the full description on the dataset page: https://huggingface.co/datasets/IR-Cocktail/scifact.
h
scifact.pisa
huggingface.co
Updated Oct 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PyTerrier (2024). scifact.pisa [Dataset]. https://huggingface.co/datasets/pyterrier/scifact.pisa
Explore at:
Dataset updated
Oct 8, 2024
Dataset authored and provided by
PyTerrier
Description
scifact.pisa

Description

A PISA index for the SciFact dataset

Usage

Load the artifact

import pyterrier as pt index = pt.Artifact.from_hf('pyterrier/scifact.pisa') index.bm25() # returns a BM25 retriever

Benchmarks

name nDCG@10 R@1000

bm25 0.6776 0.9733

dph 0.6735 0.97

Reproduction

import pyterrier as pt from tqdm import tqdm import ir_datasets from pyterrier_pisa import PisaIndex index = PisaIndex("scifact.pisa"… See the full description on the dataset page: https://huggingface.co/datasets/pyterrier/scifact.pisa.
h
dummy-scifact
huggingface.co
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khurana (2025). dummy-scifact [Dataset]. https://huggingface.co/datasets/gurnoor-ctx/dummy-scifact
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2025
Authors
Khurana
Description
gurnoor-ctx/dummy-scifact dataset hosted on Hugging Face and contributed by the HF Datasets community
h
scifact-open
huggingface.co
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UMBC SciFy Team (2025). scifact-open [Dataset]. https://huggingface.co/datasets/umbc-scify/scifact-open
Explore at:
Dataset updated
Apr 10, 2025
Dataset authored and provided by
UMBC SciFy Team
Description
Data Stats

206 claims 500k distractors

Data Structure Test

claim evidence: GT evidence evidence_id: GT evidence id label: GT label evidences: list of all evidences evidence_ids: list of all evidence ids labels: list of all labels

Distractors

evidence evidence_id

Process Code

import pandas as pd

from datasets import Dataset

claims = pd.read_csv("./scifact_open_retriever_test.csv") claims.head()

docs =… See the full description on the dataset page: https://huggingface.co/datasets/umbc-scify/scifact-open.
h
scifact-vn
huggingface.co
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GreenNode.ai (2025). scifact-vn [Dataset]. https://huggingface.co/datasets/GreenNode/scifact-vn
Explore at:
Dataset updated
Jul 31, 2025
Dataset provided by
GreenNode.ai
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
How to evaluate on this task

You can evaluate an embedding model on this dataset using the following code: import mteb

task = mteb.get_tasks(["Scifact-VN"]) evaluator = mteb.MTEB(task)

model = mteb.get_model(YOUR_MODEL) evaluator.run(model)

To learn more about how to run models on mteb task check out the GitHub repitory.

Citation

If you use this dataset, please cite the dataset as well as mteb, as this dataset likely includes additional processing as a part of… See the full description on the dataset page: https://huggingface.co/datasets/GreenNode/scifact-vn.
h
scifact-fa-v2
huggingface.co
Updated Jul 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MCINext (2025). scifact-fa-v2 [Dataset]. https://huggingface.co/datasets/MCINext/scifact-fa-v2
Explore at:
Dataset updated
Jul 27, 2025
Dataset authored and provided by
MCINext
Description
MCINext/scifact-fa-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
scifact-tr
huggingface.co
Updated May 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Turkish Massive Text Embedding Benchmark (2025). scifact-tr [Dataset]. https://huggingface.co/datasets/trmteb/scifact-tr
Explore at:
Dataset updated
May 29, 2025
Dataset authored and provided by
Turkish Massive Text Embedding Benchmark
Description
trmteb/scifact-tr dataset hosted on Hugging Face and contributed by the HF Datasets community
h
SciFact
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francielle Vargas, SciFact [Dataset]. https://huggingface.co/datasets/franciellevargas/SciFact
Explore at:
Authors
Francielle Vargas
Description
franciellevargas/SciFact dataset hosted on Hugging Face and contributed by the HF Datasets community
h
rus-scifact
huggingface.co
Updated Feb 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grigory Kovalev (2025). rus-scifact [Dataset]. https://huggingface.co/datasets/kaengreg/rus-scifact
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 20, 2025
Authors
Grigory Kovalev
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
kaengreg/rus-scifact dataset hosted on Hugging Face and contributed by the HF Datasets community
h
splade-scifact-train-retrievals
huggingface.co
Updated Apr 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jasper Xian (2024). splade-scifact-train-retrievals [Dataset]. https://huggingface.co/datasets/jasper-xian/splade-scifact-train-retrievals
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 6, 2024
Authors
Jasper Xian
Description
jasper-xian/splade-scifact-train-retrievals dataset hosted on Hugging Face and contributed by the HF Datasets community
h
beir_scifact
huggingface.co
Updated Aug 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ir-datasets (2023). beir_scifact [Dataset]. https://huggingface.co/datasets/irds/beir_scifact
Explore at:
Dataset updated
Aug 5, 2023
Dataset authored and provided by
ir-datasets
Description
Dataset Card for beir/scifact

The beir/scifact dataset, provided by the ir-datasets package. For more information about the dataset, see the documentation.

Data

This dataset provides:

docs (documents, i.e., the corpus); count=5,183 queries (i.e., topics); count=1,109

This dataset is used by: beir_scifact_test, beir_scifact_train

Usage

from datasets import load_dataset

docs = load_dataset('irds/beir_scifact', 'docs') for record in docs: record #… See the full description on the dataset page: https://huggingface.co/datasets/irds/beir_scifact.
h
gpl-scifact
huggingface.co
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nandan Thakur (2025). gpl-scifact [Dataset]. https://huggingface.co/datasets/nthakur/gpl-scifact
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 31, 2025
Authors
Nandan Thakur
Description
nthakur/gpl-scifact dataset hosted on Hugging Face and contributed by the HF Datasets community
h
scifact.splade-v3.cache
huggingface.co
Updated Oct 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PyTerrier (2024). scifact.splade-v3.cache [Dataset]. https://huggingface.co/datasets/pyterrier/scifact.splade-v3.cache
Explore at:
Dataset updated
Oct 15, 2024
Dataset authored and provided by
PyTerrier
Description
scifact.splade-v3.cache

Description

TODO: What is the artifact?

Usage

Load the artifact

import pyterrier_alpha as pta artifact = pta.Artifact.from_hf('pyterrier/scifact.splade-v3.cache')

TODO: Show how you use the artifact

Benchmarks

TODO: Provide benchmarks for the artifact.

Reproduction

TODO: Show how you constructed the artifact.

Metadata

{ "type": "indexer_cache", "format": "lz4pickle", "package_hint":… See the full description on the dataset page: https://huggingface.co/datasets/pyterrier/scifact.splade-v3.cache.
h
beir-nl-scifact
huggingface.co
Updated Feb 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CLiPS (2025). beir-nl-scifact [Dataset]. https://huggingface.co/datasets/clips/beir-nl-scifact
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 10, 2025
Dataset authored and provided by
CLiPS
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR-NL Benchmark

Dataset Summary

BEIR-NL is a Dutch-translated version of the BEIR benchmark, a diverse and heterogeneous collection of datasets covering various domains from biomedical and financial texts to general web content. Our benchmark is integrated into the Massive Multilingual Text Embedding Benchmark (MMTEB). BEIR-NL contains the following tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018… See the full description on the dataset page: https://huggingface.co/datasets/clips/beir-nl-scifact.
h
scifact-top-20-gen-queries
huggingface.co
Updated Apr 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
INCOME (2023). scifact-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/scifact-top-20-gen-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 1, 2023
Dataset authored and provided by
INCOME
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
NFCorpus: 20 generated queries (BEIR Benchmark)

This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

Below contains the old dataset card for the BEIR benchmark.

Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/scifact-top-20-gen-queries.
h
sheared-llama-scifact-results-new
huggingface.co
Updated Dec 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vaibhav Adlakha (2024). sheared-llama-scifact-results-new [Dataset]. https://huggingface.co/datasets/vaibhavad/sheared-llama-scifact-results-new
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 21, 2024
Authors
Vaibhav Adlakha
Description
vaibhavad/sheared-llama-scifact-results-new dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

BEIR (2023). scifact [Dataset]. https://huggingface.co/datasets/BeIR/scifact

scifact

BeIR/scifact

BEIR Benchmark

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 16, 2023

Dataset authored and provided by

BEIR

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset Card for BEIR Benchmark

  Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/scifact.

Clear search

Close search

Google apps

Main menu

scifact

scifact

scifact

scifact

scifact

Test Query

Corpus

scifact.pisa

Load the artifact

dummy-scifact

scifact-open

scifact-vn

scifact-fa-v2

scifact-tr

SciFact

rus-scifact

splade-scifact-train-retrievals

beir_scifact

gpl-scifact

scifact.splade-v3.cache

Load the artifact

TODO: Show how you use the artifact

TODO: Show how you constructed the artifact.

beir-nl-scifact

scifact-top-20-gen-queries

sheared-llama-scifact-results-new

scifactSee More Versions

BeIR/scifact

BEIR Benchmark

scifact