100+ datasets found

P
BEIR Dataset
paperswithcode.com
library.toponeai.link
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nandan Thakur; Nils Reimers; Andreas Rücklé; Abhishek Srivastava; Iryna Gurevych, BEIR Dataset [Dataset]. https://paperswithcode.com/dataset/beir
Explore at:
Authors
Nandan Thakur; Nils Reimers; Andreas Rücklé; Abhishek Srivastava; Iryna Gurevych
Description
BEIR (Benchmarking IR) is a heterogeneous benchmark containing different information retrieval (IR) tasks. Through BEIR, it is possible to systematically study the zero-shot generalization capabilities of multiple neural retrieval approaches.

The benchmark contains a total of 9 information retrieval tasks (Fact Checking, Citation Prediction, Duplicate Question Retrieval, Argument Retrieval, News Retrieval, Question Answering, Tweet Retrieval, Biomedical IR, Entity Retrieval) from 19 different datasets:

MS MARCO TREC-COVID NFCorpus BioASQ Natural Questions HotpotQA FiQA-2018 Signal-1M TREC-News ArguAna Touche 2020 CQADupStack Quora Question Pairs DBPedia SciDocs FEVER Climate-FEVER SciFact Robust04
W
IR Benchmarks
webis.de
anthology.aicmu.ac.cn
Updated 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maik Fröbe; Simon Reich; Niklas Deckers; Janek Bevendorff; Benno Stein; Matthias Hagen; Martin Potthast (2023). IR Benchmarks [Dataset]. https://webis.de/data/ir-benchmarks.html
Explore at:
Dataset updated
2023
Dataset provided by
Friedrich Schiller University Jena
Bauhaus-Universität Weimar and Leipzig University
The Web Technology & Information Systems Network
University of Kassel and hessian.AI
University of Kassel, hessian.AI, and ScaDS.AI
Bauhaus-Universität Weimar
Authors
Maik Fröbe; Simon Reich; Niklas Deckers; Janek Bevendorff; Benno Stein; Matthias Hagen; Martin Potthast
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A collection of information retrieval benchmarks covering 15 corpora (1.9 billion documents) on which 32 well-known shared tasks are based. We filled the leaderboards with Docker images of 50 standard retrieval approaches. Within this setup, we were able to automatically run and evaluate the 50 approaches on the 32 tasks (1600 runs). All Benchmarks are added as training datasets because their qrels are already publicly available. Please find a detailed tutorial on how to submit approaches on github.
View on TIRA: https://tira.io/task-overview/ir-benchmarks
DORIS-MAE-v1
zenodo.org
data.niaid.nih.gov
bin, json
Updated Oct 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi; Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi (2023). DORIS-MAE-v1 [Dataset]. http://doi.org/10.5281/zenodo.8299749
Explore at:
bin, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8299749
Dataset updated
Oct 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi; Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
In scientific research, the ability to effectively retrieve relevant documents based on complex, multifaceted queries is critical. Existing evaluation datasets for this task are limited, primarily due to the high costs and effort required to annotate resources that effectively represent complex queries. To address this, we propose a novel task, Scientific DOcument Retrieval using Multi-level Aspect-based quEries (DORIS-MAE), which is designed to handle the complex nature of user queries in scientific research.

Documentations for the DORIS-MAE dataset is publicly available at https://github.com/Real-Doris-Mae/Doris-Mae-Dataset. This upload contains both DORIS-MAE dataset version 1 and ada-002 vector embeddings for all queries and related abstracts (used in candidate pool creation). DORIS-MAE dataset version 1 is comprised of four main sub-datasets, each serving distinct purposes.

The Query dataset contains 100 human-crafted complex queries spanning across five categories: ML, NLP, CV, AI, and Composite. Each category has 20 associated queries. Queries are broken down into aspects (ranging from 3 to 9 per query) and sub-aspects (from 0 to 6 per aspect, with 0 signifying no further breakdown required). For each query, a corresponding candidate pool of relevant paper abstracts, ranging from 99 to 138, is provided.

The Corpus dataset is composed of 363,133 abstracts from computer science papers, published between 2011-2021, and sourced from arXiv. Each entry includes title, original abstract, URL, primary and secondary categories, as well as citation information retrieved from Semantic Scholar. A masked version of each abstract is also provided, facilitating the automated creation of queries.

The Annotation dataset includes generated annotations for all 165,144 question pairs, each comprising an aspect/sub-aspect and a corresponding paper abstract from the query's candidate pool. It includes the original text generated by ChatGPT (version chatgpt-3.5-turbo-0301) explaining its decision-making process, along with a three-level relevance score (e.g., 0,1,2) representing ChatGPT's final decision.

Finally, the Test Set dataset contains human annotations for a random selection of 250 question pairs used in hypothesis testing. It includes each of the three human annotators' final decisions, recorded as a three-level relevance score (e.g., 0,1,2).

The file "ada_embedding_for_DORIS-MAE_v1.pickle" contains text embeddings for the DORIS-MAE dataset, generated by OpenAI's ada-002 model. The structure of the file is as follows:

├── ada_embedding_for_DORIS-MAE_v1.pickle
├── "Query"
│ ├── query_id_1 (Embedding of query_1)
│ ├── query_id_2 (Embedding of query_2)
│ └── query_id_3 (Embedding of query_3)
│ .
│ .
│ .
└── "Corpus"
├── corpus_id_1 (Embedding of abstract_1)
├── corpus_id_2 (Embedding of abstract_2)
└── corpus_id_3 (Embedding of abstract_3)
.
.
.
P
CoIR Dataset
paperswithcode.com
Updated Nov 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiangyang Li; Kuicai Dong; Yi Quan Lee; Wei Xia; Hao Zhang; Xinyi Dai; Yasheng Wang; Ruiming Tang (2024). CoIR Dataset [Dataset]. https://paperswithcode.com/dataset/coir
Explore at:
Dataset updated
Nov 30, 2024
Authors
Xiangyang Li; Kuicai Dong; Yi Quan Lee; Wei Xia; Hao Zhang; Xinyi Dai; Yasheng Wang; Ruiming Tang
Description
CoIR (Code Information Retrieval) benchmark, is designed to evaluate code retrieval capabilities. CoIR includes 10 curated code datasets, covering 8 retrieval tasks across 7 domains. In total, it encompasses two million documents. It also provides a common and easy Python framework, installable via pip, and shares the same data schema as benchmarks like MTEB and BEIR for easy cross-benchmark evaluations.
h
germanquad-retrieval-qrels
huggingface.co
Updated Jan 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2024). germanquad-retrieval-qrels [Dataset]. https://huggingface.co/datasets/mteb/germanquad-retrieval-qrels
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 15, 2024
Dataset authored and provided by
Massive Text Embedding Benchmark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is derived from the GermanQuAD dataset. This dataset takes the testset and represents it as qrels in the BEIR information retrieval benchmark format. Corpus and query ids have been added. The corresponding corpus can be found here. Full credit for the original dataset goes to the authors of the GermanQuAD dataset. The original dataset is licensed under CC BY-SA 4.0. Citation for the original dataset: @misc{möller2021germanquad, title={GermanQuAD and GermanDPR: Improving… See the full description on the dataset page: https://huggingface.co/datasets/mteb/germanquad-retrieval-qrels.
P
ReQA Dataset
paperswithcode.com
Updated Nov 15, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amin Ahmad; Noah Constant; Yinfei Yang; Daniel Cer (2021). ReQA Dataset [Dataset]. https://paperswithcode.com/dataset/reqa
Explore at:
Dataset updated
Nov 15, 2021
Authors
Amin Ahmad; Noah Constant; Yinfei Yang; Daniel Cer
Description
Retrieval Question-Answering (ReQA) benchmark tests a model’s ability to retrieve relevant answers efficiently from a large set of documents.
h
tax-retrieval-benchmark
huggingface.co
Updated Oct 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louis Brulé Naudet (2023). tax-retrieval-benchmark [Dataset]. https://huggingface.co/datasets/louisbrulenaudet/tax-retrieval-benchmark
Explore at:
Dataset updated
Oct 17, 2023
Authors
Louis Brulé Naudet
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
French Taxation Embedding Benchmark (retrieval)

This dataset is designed for the task of retrieving relevant tax articles or content based on queries in the French language. It can be used for benchmarking information retrieval systems, particularly in the legal and financial domains.

Massive Text Embedding Benchmark for French Taxation

In this notebook, we will explore the process of adding a new task to the Massive Text Embedding Benchmark (MTEB). The MTEB is an… See the full description on the dataset page: https://huggingface.co/datasets/louisbrulenaudet/tax-retrieval-benchmark.
t
Legal CaPER Benchmark
researchdata.tuwien.at
json, tsv, txt
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tobias Fink; Tobias Fink; Tobias Fink; Tobias Fink (2024). Legal CaPER Benchmark [Dataset]. http://doi.org/10.48436/5caar-3r468
Explore at:
txt, tsv, jsonAvailable download formats
Unique identifier
https://doi.org/10.48436/5caar-3r468
Dataset updated
Jun 25, 2024
Dataset provided by
TU Wien
Authors
Tobias Fink; Tobias Fink; Tobias Fink; Tobias Fink
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Legal Case Passage Extraction and Retrieval benchmark is an information retrieval benchmark collection for court case passage retrieval. Specifically, it is a collection for evaluating Cited Case Passage Retrieval (CCPR) and contains case passages from the Austrian building regulations domain (Source: RIS). The following files are included in the dataset:
full_collection.tsv
A tab separated file containing the passage texts of court cases from the building regulations domain. Column 1 contains the ID of the passage, Column 2 contains the passage text and Column 3 contains the case ID (Geschäftszahl) of the origin case of the passage.
queries.tsv
A tab separated file containing the queries / topics for which relevance assessments exist in this collection. Column 1 contains the ID of the query, Column 2 contains the query passage text and Column 3 contains the case ID (Geschäftszahl) of the cited case. For the task of CCPR, it is intended that results are additionally filtered based on exact matches of the case ID. For each query, only relevance assessments exist for passages that match the case ID of column 3.
qrel.json
Contains relevance assessments for each query. In this dictionary, a passage from the full collection is relevant for a query if qrel[
qrel.json.txt
A conversion of the qrel.json file to be compatible with trec eval.
Data from: #nowplaying-RS: A New Benchmark Dataset for Building...
zenodo.org
data.niaid.nih.gov
pdf, zip
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asmita Poddar; Eva Zangerle; Yi-Hsuan Yang; Asmita Poddar; Eva Zangerle; Yi-Hsuan Yang (2024). #nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music Recommender Systems [Dataset]. http://doi.org/10.5281/zenodo.3242238
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3242238
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Asmita Poddar; Eva Zangerle; Yi-Hsuan Yang; Asmita Poddar; Eva Zangerle; Yi-Hsuan Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Music recommender systems can offer users personalized and contextualized recommendation and are therefore important for music information retrieval. An increasing number of datasets have been compiled to facilitate research on different topics, such as content-based, context-based or next-song recommendation. However, these topics are usually addressed separately using different datasets, due to the lack of a unified dataset that contains a large variety of feature types such as item features, user contexts, and timestamps. To address this issue, we propose a large-scale benchmark dataset called #nowplaying-RS, which contains 11.6 million music listening events (LEs) of 139K users and 346K tracks collected from Twitter. The dataset comes with a rich set of item content features and user context features, and the timestamps of the LEs. Moreover, some of the user context features imply the cultural origin of the users, and some others—like hashtags—give clues to the emotional state of a user underlying an LE. In this paper, we provide some statistics to give insight into the dataset, and some directions in which the dataset can be used for making music recommendation. We also provide standardized training and test sets for experimentation, and some baseline results obtained by using factorization machines.

The dataset contains three files:

user_track_hashtag_timestamp.csv contains basic information about each listening event. For each listening event, we provide an id, the user_id, track_id, hashtag, created_at

context_content_features.csv: contains all context and content features. For each listening event, we provide the id of the event, user_id, track_id, artist_id, content features regarding the track mentioned in the event (instrumentalness, liveness, speechiness, danceability, valence, loudness, tempo, acousticness, energy, mode, key) and context features regarding the listening event (coordinates (as geoJSON), place (as geoJSON), geo (as geoJSON), tweet_language, created_at, user_lang, time_zone, entities contained in the tweet).

sentiment_values.csv contains sentiment information for hashtags. It contains the hashtag itself and the sentiment values gathered via four different sentiment dictionaries: AFINN, Opinion Lexicon, Sentistrength Lexicon and vader. For each of these dictionaries we list the minimum, maximum, sum and average of all sentiments of the tokens of the hashtag (if available, else we list empty values). However, as most hashtags only consist of a single token, these values are equal in most cases. Please note that the lexica are rather diverse and therefore, are able to resolve very different terms against a score. Hence, the resulting csv is rather sparse. The file contains the following comma-separated values:

Please also find the training and test-splits for the dataset in this repo. Also, prototypical implementations of a context-aware recommender system based on the dataset can be found at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM.

If you make use of this dataset, please cite the following paper where we describe and experiment with the dataset:

@inproceedings{smc18,
title = {#nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music Recommender Systems},
author = {Asmita Poddar and Eva Zangerle and Yi-Hsuan Yang},
url = {http://mac.citi.sinica.edu.tw/~yang/pub/poddar18smc.pdf},
year = {2018},
date = {2018-07-04},
booktitle = {Proceedings of the 15th Sound & Music Computing Conference},
address = {Limassol, Cyprus},
note = {code at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM},
tppubtype = {inproceedings}
}
h
trec-news-generated-queries
huggingface.co
Updated Aug 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2022). trec-news-generated-queries [Dataset]. https://huggingface.co/datasets/BeIR/trec-news-generated-queries
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2022
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/trec-news-generated-queries.
O
BEIR (Benchmarking IR)
opendatalab.com
zip
Updated Sep 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technical University of Darmstadt (2022). BEIR (Benchmarking IR) [Dataset]. https://opendatalab.com/OpenDataLab/BEIR
Explore at:
zipAvailable download formats
Dataset updated
Sep 29, 2022
Dataset provided by
Technical University of Darmstadt
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
BEIR (Benchmarking IR) is an heterogeneous benchmark containing different information retrieval (IR) tasks. Through BEIR, it is possible to systematically study the zero-shot generalization capabilities of multiple neural retrieval approaches. The benchmark contains a total of 9 information retrieval tasks (Fact Checking, Citation Prediction, Duplicate Question Retrieval, Argument Retrieval, News Retrieval, Question Answering, Tweet Retrieval, Biomedical IR, Entity Retrieval) from 17 different datasets: MS MARCO TREC-COVID NFCorpus BioASQ Natural Questions HotpotQA FiQA-2018 Signal-1M TREC-News ArguAna Touche 2020 CQADupStack Quora Question Pairs DBPedia SciDocs FEVER Climate-FEVER SciFact
P
NFCorpus Dataset
paperswithcode.com
opendatalab.com
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NFCorpus Dataset [Dataset]. https://paperswithcode.com/dataset/nfcorpus
Explore at:
Description
NFCorpus is a full-text English retrieval data set for Medical Information Retrieval. It contains a total of 3,244 natural language queries (written in non-technical English, harvested from the NutritionFacts.org site) with 169,756 automatically extracted relevance judgments for 9,964 medical documents (written in a complex terminology-heavy language), mostly from PubMed.
Data from: A Multi-domain Benchmark for Personalized Search Evaluation
zenodo.org
zip
Updated Jun 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elias Bassani; Elias Bassani; Pranav Kasela; Alessandro Raganato; Alessandro Raganato; Gabriella Pasi; Gabriella Pasi; Pranav Kasela (2022). A Multi-domain Benchmark for Personalized Search Evaluation [Dataset]. http://doi.org/10.5281/zenodo.6606557
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6606557
Dataset updated
Jun 13, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Elias Bassani; Elias Bassani; Pranav Kasela; Alessandro Raganato; Alessandro Raganato; Gabriella Pasi; Gabriella Pasi; Pranav Kasela
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We provide large-scale multi-domain benchmark datasets for Personalized Search.

Further information can be found here.
Data from: List.MID: A MIDI-Based Benchmark for Evaluating RDF Lists
figshare.com
data.niaid.nih.gov
+1more
zip
Updated Jul 2, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Albert Meroño-Peñuela; Enrico Daga (2019). List.MID: A MIDI-Based Benchmark for Evaluating RDF Lists [Dataset]. http://doi.org/10.6084/m9.figshare.8426912.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8426912.v1
Dataset updated
Jul 2, 2019
Dataset provided by
Figsharehttp://figshare.com/
Authors
Albert Meroño-Peñuela; Enrico Daga
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Linked lists represent a countable number of ordered values, and are among the most important abstract data types in computer science. With the advent of RDF as a highly expressive knowledge representation language for the Web, various implementations for RDF lists have been proposed. Yet, there is no benchmark so far dedicated to evaluate the performance of triple stores and SPARQL query engines on dealing with ordered linked data. Moreover, essential tasks for evaluating RDF lists, like generating datasets containing RDF lists of various sizes, or generating the same RDF list using different modelling choices, are cumbersome and unprincipled. In this paper, we propose List.MID, a systematic benchmark for evaluating systems serving RDF lists. List.MID consists of a dataset generator, which creates RDF list data in various models and of different sizes; and a set of SPARQL queries. The RDF list data is coherently generated from a large, community-curated base collection of Web MIDI files, rich in lists of musical events of arbitrary length. We describe the List.MID benchmark, and discuss its impact and adoption, reusability, design, and availability.
P
CLIRMatrix Dataset
paperswithcode.com
opendatalab.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuo Sun; Kevin Duh, CLIRMatrix Dataset [Dataset]. https://paperswithcode.com/dataset/clirmatrix
Explore at:
Authors
Shuo Sun; Kevin Duh
Description
CLIRMatrix is a large collection of bilingual and multilingual datasets for Cross-Lingual Information Retrieval. It includes:

BI-139: A bilingual dataset of queries in one language matched with relevant documents in another language for 139x138=19,182 language pairs, MULTI-8, a multilingual dataset of queries and documents jointly aligned in 8 different languages.

In total, 49 million unique queries and 34 billion (query, document, label) triplets were mined, making CLIRMatrix the largest and most comprehensive CLIR dataset to date.
P
Data from: Multi-EuP: The Multilingual European Parliament Dataset for...
paperswithcode.com
Updated Nov 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jinrui Yang; Timothy Baldwin; Trevor Cohn (2023). Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval Dataset [Dataset]. https://paperswithcode.com/dataset/multi-eup
Explore at:
Dataset updated
Nov 2, 2023
Authors
Jinrui Yang; Timothy Baldwin; Trevor Cohn
Description
The Multi-Eup is a new multilingual benchmark dataset, comprising 22K multilingual documents collected from the European Parliament, spanning 24 languages. This dataset is designed to investigate fairness in a multilingual information retrieval (IR) context to analyze both language and demographic bias in a ranking context. It boasts an authentic multilingual corpus, featuring topics translated into all 24 languages, as well as cross-lingual relevance judgments. Furthermore, it offers rich demographic information associated with its documents, facilitating the study of demographic bias.
h
arguana-qrels
huggingface.co
opendatalab.com
Updated Aug 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2022). arguana-qrels [Dataset]. https://huggingface.co/datasets/BeIR/arguana-qrels
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 15, 2022
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/arguana-qrels.

SE-PQA: a Resource for Personalized Community Question Answering

zenodo.org

csv, zip

Updated Feb 5, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Kasela Pranav; Kasela Pranav; Pasi Gabriella; Pasi Gabriella; Perego Raffaele; Perego Raffaele; Marco Braga; Marco Braga (2024). SE-PQA: a Resource for Personalized Community Question Answering [Dataset]. http://doi.org/10.5281/zenodo.7940964

Explore at:

csv, zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7940964

Dataset updated

Feb 5, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Kasela Pranav; Kasela Pranav; Pasi Gabriella; Pasi Gabriella; Perego Raffaele; Perego Raffaele; Marco Braga; Marco Braga

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Personalization in Information Retrieval is a topic studied for a long time. Nevertheless, there is still a lack of high-quality, real-world datasets to conduct large-scale experiments and evaluate models for personalized search. This paper contributes to fill this gap by introducing SE-PQA (StackExchange - Personalized Question Answering), a new resource to design and evaluate personalized models related to the two tasks of community Question Answering (cQA). The contributed dataset includes more than 1 million queries and 2 million answers, annotated with a rich set of features modeling the social interactions among the users of a popular cQA platform. We describe the characteristics of SE-PQA and detail the features associated with both questions and answers. We also provide reproducible baseline methods for the cQA task based on the resource, including deep learning models and personalization approaches. The results of the preliminary experiments conducted show the appropriateness of SE-PQA to train effective cQA models; they also show that personalization improves remarkably the effectiveness of all the methods tested. Furthermore, we show the benefits in terms of robustness and generalization of combining data from multiple communities for personalization purposes.

Performance on all communities separately:

  <tbody><tr>
    <th>Community</th>
    <th>Model (BM25 +)</th>
    <th>P@1</th>
    <th>NDCG@3</th>
    <th>NDCG@10</th>
    <th>R@100</th>
    <th>MAP@100</th>
    <th>$\lambda$</th>
  </tr>

</tbody><tbody>
  <tr>
    <td>Academia</td>
    <td>MiniLM</td>
    <td>0.438</td>
    <td>0.382</td>
    <td>0.395</td>
    <td>0.489</td>
    <td>0.344</td>
    <td>(.1,.9)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.453</td>
    <td>0.392</td>
    <td>0.403</td>
    <td>0.489</td>
    <td>0.352</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>Anime</td>
    <td>MiniLM + TAG</td>
    <td>0.650</td>
    <td>0.682</td>
    <td>0.714</td>
    <td>0.856</td>
    <td>0.683</td>
    <td>(.1,.9,.0)</td>
  </tr>
  <tr>
    <td>Apple</td>
    <td>MiniLM</td>
    <td>0.327</td>
    <td>0.351</td>
    <td>0.381</td>
    <td>0.514</td>
    <td>0.349</td>
    <td>(.1,.9)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.335</td>
    <td>0.361</td>
    <td>0.389</td>
    <td>0.514</td>
    <td>0.357</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>Bicycles</td>
    <td>MiniLM</td>
    <td>0.405</td>
    <td>0.380</td>
    <td>0.421</td>
    <td>0.600</td>
    <td>0.365</td>
    <td>(.1,.9)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.436</td>
    <td>0.405</td>
    <td>0.441</td>
    <td>0.600</td>
    <td>0.386</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>Boardgames</td>
    <td>MiniLM</td>
    <td>0.681</td>
    <td>0.694</td>
    <td>0.728</td>
    <td>0.866</td>
    <td>0.692</td>
    <td>(.1,.9)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.696</td>
    <td>0.702</td>
    <td>0.736</td>
    <td>0.866</td>
    <td>0.699</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>Buddhism</td>
    <td>MiniLM + TAG</td>
    <td>0.490</td>
    <td>0.387</td>
    <td>0.397</td>
    <td>0.544</td>
    <td>0.334</td>
    <td>(.3,.7,.0)</td>
  </tr>
  <tr>
    <td>Christianity</td>
    <td>MiniLM</td>
    <td>0.534</td>
    <td>0.505</td>
    <td>0.555</td>
    <td>0.783</td>
    <td>0.497</td>
    <td>(.2,.8)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.549</td>
    <td>0.521</td>
    <td>0.564</td>
    <td>0.783</td>
    <td>0.507</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>Cooking</td>
    <td>MiniLM</td>
    <td>0.600</td>
    <td>0.567</td>
    <td>0.600</td>
    <td>0.719</td>
    <td>0.553</td>
    <td>(.1,.9)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.619</td>
    <td>0.583</td>
    <td>0.614</td>
    <td>0.719</td>
    <td>0.568</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>DIY</td>
    <td>MiniLM</td>
    <td>0.323</td>
    <td>0.313</td>
    <td>0.346</td>
    <td>0.501</td>
    <td>0.302</td>
    <td>(.1,.9)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.335</td>
    <td>0.324</td>
    <td>0.356</td>
    <td>0.501</td>
    <td>0.312</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>Expatriates</td>
    <td>MiniLM + TAG</td>
    <td>0.596</td>
    <td>0.653</td>
    <td>0.682</td>
    <td>0.832</td>
    <td>0.645</td>
    <td>(.1,.9,.0)</td>
  </tr>
  <tr>
    <td>Fitness</td>
    <td>MiniLM + TAG</td>
    <td>0.568</td>
    <td>0.575</td>
    <td>0.613</td>
    <td>0.760</td>
    <td>0.567</td>
    <td>(.2,.8,.0)</td>
  </tr>
  <tr>
    <td>Freelancing</td>
    <td>MiniLM + TAG</td>
    <td>0.513</td>
    <td>0.472</td>
    <td>0.506</td>
    <td>0.654</td>
    <td>0.457</td>
    <td>(.1,.9,.0)</td>
  </tr>
  <tr>
    <td>Gaming</td>
    <td>MiniLM</td>
    <td>0.510</td>
    <td>0.534</td>
    <td>0.562</td>
    <td>0.686</td>
    <td>0.532</td>
    <td>(.1,.9)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.519</td>
    <td>0.547</td>
    <td>0.571</td>
    <td>0.686</td>
    <td>0.541</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>Gardening</td>
    <td>MiniLM</td>
    <td>0.344</td>
    <td>0.362</td>
    <td>0.396</td>
    <td>0.520</td>
    <td>0.359</td>
    <td>(.1,.9)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.345</td>
    <td>0.369</td>
    <td>0.399</td>
    <td>0.520</td>
    <td>0.363</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>Genealogy</td>
    <td>MiniLM + TAG</td>
    <td>0.592</td>
    <td>0.605</td>
    <td>0.631</td>
    <td>0.779</td>
    <td>0.594</td>
    <td>(.3,.7,.0)</td>
  </tr>
  <tr>
    <td>Health</td>
    <td>MiniLM + TAG</td>
    <td>0.718</td>
    <td>0.765</td>
    <td>0.797</td>
    <td>0.934</td>
    <td>0.765</td>
    <td>(.2,.8,.0)</td>
  </tr>
  <tr>
    <td>Gaming</td>
    <td>MiniLM</td>
    <td>0.510</td>
    <td>0.534</td>
    <td>0.562</td>
    <td>0.686</td>
    <td>0.532</td>
    <td>(.1,.9)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.519</td>
    <td>0.547</td>
    <td>0.571</td>
    <td>0.686</td>
    <td>0.541</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>Hermeneutics</td>
    <td>MiniLM</td>
    <td>0.589</td>
    <td>0.538</td>
    <td>0.593</td>
    <td>0.828</td>
    <td>0.526</td>
    <td>(.2,.8)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.632</td>
    <td>0.570</td>
    <td>0.617</td>
    <td>0.828</td>
    <td>0.552</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>Hinduism</td>
    <td>MiniLM</td>
    <td>0.388</td>
    <td>0.415</td>
    <td>0.459</td>
    <td>0.686</td>
    <td>0.416</td>
    <td>(.2,.8)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.382</td>
    <td>0.410</td>
    <td>0.457</td>
    <td>0.686</td>
    <td>0.412</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>History</td>
    <td>MiniLM + TAG</td>
    <td>0.740</td>
    <td>0.735</td>
    <td>0.764</td>
    <td>0.862</td>
    <td>0.730</td>
    <td>(.2,.8,.0)</td>
  </tr>
  <tr>
    <td>Hsm</td>
    <td>MiniLM + TAG</td>
    <td>0.666</td>
    <td>0.707</td>
    <td>0.737</td>
    <td>0.870</td>
    <td>0.690</td>
    <td>(.2,.8,.0)</td>
  </tr>
  <tr>
    <td>Interpersonal</td>
    <td>MiniLM + TAG</td>
    <td>0.663</td>
    <td>0.617</td>
    <td>0.653</td>
    <td>0.739</td>
    <td>0.604</td>
    <td>(.2,.8,.0)</td>
  </tr>
  <tr>
    <td>Islam</td>
    <td>MiniLM</td>
    <td>0.382</td>
    <td>0.412</td>
    <td>0.453</td>
    <td>0.642</td>
    <td>0.410</td>
    <td>(.1,.9)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.395</td>
    <td>0.427</td>
    <td>0.464</td>
    <td>0.642</td>
    <td>0.421</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>Judaism</td>
    <td>MiniLM + TAG</td>
    <td>0.363</td>
    <td>0.387</td>
    <td>0.432</td>
    <td>0.649</td>
    <td>0.388</td>
    <td>(.2,.8,.0)</td>
  </tr>
  <tr>
    <td>Law</td>
    <td>MiniLM</td>
    <td>0.663</td>
    <td>0.647</td>
    <td>0.678</td>
    <td>0.803</td>
    <td>0.639</td>
    <td>(.2,.8)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.677</td>
    <td>0.657</td>
    <td>0.687</td>
    <td>0.803</td>
    <td>0.649</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>Lifehacks</td>
    <td>MiniLM</td>
    <td>0.714</td>
    <td>0.601</td>
    <td>0.617</td>
    <td>0.703</td>
    <td>0.553</td>
    <td>(.1,.9)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.714</td>
    <td>0.621</td>
    <td>0.631</td>
    <td>0.703</td>
    <td>0.568</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>Linguistics</td>
    <td>MiniLM + TAG</td>
    <td>0.584</td>
    <td>0.588</td>
    <td>0.630</td>
    <td>0.794</td>
    <td>0.587</td>
    <td>(.2,.8,.0)</td>
  </tr>
  <tr>
    <td>Literature</td>
    <td>MiniLM + TAG</td>
    <td>0.871</td>
    <td>0.878</td>
    <td>0.889</td>
    <td>0.934</td>
    <td>0.876</td>
    <td>(.3,.7,.0)</td>
  </tr>
  <tr>
    <td>Martialarts</td>
    <td>MiniLM</td>
    <td>0.630</td>
    <td>0.599</td>
    <td>0.645</td>
    <td>0.796</td>
    <td>0.596</td>
    <td>(.1,.9)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.640</td>
    <td>0.628</td>
    <td>0.660</td>
    <td>0.796</td>
    <td>0.612</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>Money</td>
    <td>MiniLM</td>
    <td>0.545</td>
    <td>0.535</td>
    <td>0.563</td>
    <td>0.706</td>
    <td>0.515</td>
    <td>(.2,.8)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.559</td>
    <td>0.542</td>
    <td>0.571</td>
    <td>0.706</td>
    <td>0.523</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>Movies</td>
    <td>MiniLM</td>
    <td>0.713</td>
    <td>0.722</td>
    <td>0.753</td>
    <td>0.865</td>
    <td>0.724</td>
    <td>(.1,.9)</td>
  </tr>
  <tr>
    <td> </td>
    <td>MiniLM + TAG</td>
    <td>0.728</td>
    <td>0.735</td>
    <td>0.762</td>
    <td>0.865</td>
    <td>0.735</td>
    <td>(.1,.8,.1)</td>
  </tr>
  <tr>
    <td>Music</td>
    <td>MiniLM</td>
    <td>0.508</td>
    <td>0.447</td>
    <td>0.476</td>
    <td>0.602</td>
    <td>0.418</td>
    <td>(.2,.8)</td>
  </tr>
  <tr>
    <td>

f
List of the image queries
figshare.com
zip
Updated Mar 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yiltan Bitirim (2022). List of the image queries [Dataset]. http://doi.org/10.6084/m9.figshare.12336275.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12336275.v1
Dataset updated
Mar 17, 2022
Dataset provided by
figshare
Authors
Yiltan Bitirim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The image queries were used in the following studies:* Y. Bitirim, S. Bitirim, D. Ç. Ertuğrul and Ö. Toygar, “An Evaluation of Reverse Image Search Performance of Google”, 2020 IEEE 44th Annual Computer Software and Applications Conference (COMPSAC), pp. 1368-1372, IEEE, Madrid, Spain, July 2020. (DOI: 10.1109/COMPSAC48688.2020.00-65)** Y. Bitirim, “Retrieval Effectiveness of Google on Reverse Image Search”, Journal of Imaging Science and Technology, Vol. 66, No. 1, pp. 010505-1-010505-6, January 2022. (DOI: 10.2352/J.ImagingSci.Technol.2022.66.1.010505)
h
scifact
huggingface.co
Updated Aug 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2023). scifact [Dataset]. https://huggingface.co/datasets/BeIR/scifact
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 16, 2023
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/scifact.

Facebook

Twitter

Click to copy link

Link copied

Cite

Nandan Thakur; Nils Reimers; Andreas Rücklé; Abhishek Srivastava; Iryna Gurevych, BEIR Dataset [Dataset]. https://paperswithcode.com/dataset/beir

BEIR Dataset

Benchmarking IR

Explore at:

Authors

Nandan Thakur; Nils Reimers; Andreas Rücklé; Abhishek Srivastava; Iryna Gurevych

Description

BEIR (Benchmarking IR) is a heterogeneous benchmark containing different information retrieval (IR) tasks. Through BEIR, it is possible to systematically study the zero-shot generalization capabilities of multiple neural retrieval approaches.

The benchmark contains a total of 9 information retrieval tasks (Fact Checking, Citation Prediction, Duplicate Question Retrieval, Argument Retrieval, News Retrieval, Question Answering, Tweet Retrieval, Biomedical IR, Entity Retrieval) from 19 different datasets:

MS MARCO TREC-COVID NFCorpus BioASQ Natural Questions HotpotQA FiQA-2018 Signal-1M TREC-News ArguAna Touche 2020 CQADupStack Quora Question Pairs DBPedia SciDocs FEVER Climate-FEVER SciFact Robust04

Clear search

Close search

Google apps

Main menu

BEIR Dataset

IR Benchmarks

DORIS-MAE-v1

CoIR Dataset

germanquad-retrieval-qrels

ReQA Dataset

tax-retrieval-benchmark

Legal CaPER Benchmark

Data from: #nowplaying-RS: A New Benchmark Dataset for Building...

trec-news-generated-queries

BEIR (Benchmarking IR)

NFCorpus Dataset

Data from: A Multi-domain Benchmark for Personalized Search Evaluation

Data from: List.MID: A MIDI-Based Benchmark for Evaluating RDF Lists

CLIRMatrix Dataset

Data from: Multi-EuP: The Multilingual European Parliament Dataset for...

arguana-qrels

SE-PQA: a Resource for Personalized Community Question Answering

List of the image queries

scifact

BEIR Dataset

Benchmarking IR