100+ datasets found
  1. P

    BEIR Dataset

    • paperswithcode.com
    • library.toponeai.link
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nandan Thakur; Nils Reimers; Andreas Rücklé; Abhishek Srivastava; Iryna Gurevych, BEIR Dataset [Dataset]. https://paperswithcode.com/dataset/beir
    Explore at:
    Authors
    Nandan Thakur; Nils Reimers; Andreas Rücklé; Abhishek Srivastava; Iryna Gurevych
    Description

    BEIR (Benchmarking IR) is a heterogeneous benchmark containing different information retrieval (IR) tasks. Through BEIR, it is possible to systematically study the zero-shot generalization capabilities of multiple neural retrieval approaches.

    The benchmark contains a total of 9 information retrieval tasks (Fact Checking, Citation Prediction, Duplicate Question Retrieval, Argument Retrieval, News Retrieval, Question Answering, Tweet Retrieval, Biomedical IR, Entity Retrieval) from 19 different datasets:

    MS MARCO TREC-COVID NFCorpus BioASQ Natural Questions HotpotQA FiQA-2018 Signal-1M TREC-News ArguAna Touche 2020 CQADupStack Quora Question Pairs DBPedia SciDocs FEVER Climate-FEVER SciFact Robust04

  2. W

    IR Benchmarks

    • webis.de
    • anthology.aicmu.ac.cn
    Updated 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maik Fröbe; Simon Reich; Niklas Deckers; Janek Bevendorff; Benno Stein; Matthias Hagen; Martin Potthast (2023). IR Benchmarks [Dataset]. https://webis.de/data/ir-benchmarks.html
    Explore at:
    Dataset updated
    2023
    Dataset provided by
    Friedrich Schiller University Jena
    Bauhaus-Universität Weimar and Leipzig University
    The Web Technology & Information Systems Network
    University of Kassel and hessian.AI
    University of Kassel, hessian.AI, and ScaDS.AI
    Bauhaus-Universität Weimar
    Authors
    Maik Fröbe; Simon Reich; Niklas Deckers; Janek Bevendorff; Benno Stein; Matthias Hagen; Martin Potthast
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A collection of information retrieval benchmarks covering 15 corpora (1.9 billion documents) on which 32 well-known shared tasks are based. We filled the leaderboards with Docker images of 50 standard retrieval approaches. Within this setup, we were able to automatically run and evaluate the 50 approaches on the 32 tasks (1600 runs). All Benchmarks are added as training datasets because their qrels are already publicly available. Please find a detailed tutorial on how to submit approaches on github.

    View on TIRA: https://tira.io/task-overview/ir-benchmarks

  3. DORIS-MAE-v1

    • zenodo.org
    • data.niaid.nih.gov
    bin, json
    Updated Oct 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi; Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi (2023). DORIS-MAE-v1 [Dataset]. http://doi.org/10.5281/zenodo.8299749
    Explore at:
    bin, jsonAvailable download formats
    Dataset updated
    Oct 17, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi; Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    In scientific research, the ability to effectively retrieve relevant documents based on complex, multifaceted queries is critical. Existing evaluation datasets for this task are limited, primarily due to the high costs and effort required to annotate resources that effectively represent complex queries. To address this, we propose a novel task, Scientific DOcument Retrieval using Multi-level Aspect-based quEries (DORIS-MAE), which is designed to handle the complex nature of user queries in scientific research.

    Documentations for the DORIS-MAE dataset is publicly available at https://github.com/Real-Doris-Mae/Doris-Mae-Dataset. This upload contains both DORIS-MAE dataset version 1 and ada-002 vector embeddings for all queries and related abstracts (used in candidate pool creation). DORIS-MAE dataset version 1 is comprised of four main sub-datasets, each serving distinct purposes.

    The Query dataset contains 100 human-crafted complex queries spanning across five categories: ML, NLP, CV, AI, and Composite. Each category has 20 associated queries. Queries are broken down into aspects (ranging from 3 to 9 per query) and sub-aspects (from 0 to 6 per aspect, with 0 signifying no further breakdown required). For each query, a corresponding candidate pool of relevant paper abstracts, ranging from 99 to 138, is provided.

    The Corpus dataset is composed of 363,133 abstracts from computer science papers, published between 2011-2021, and sourced from arXiv. Each entry includes title, original abstract, URL, primary and secondary categories, as well as citation information retrieved from Semantic Scholar. A masked version of each abstract is also provided, facilitating the automated creation of queries.

    The Annotation dataset includes generated annotations for all 165,144 question pairs, each comprising an aspect/sub-aspect and a corresponding paper abstract from the query's candidate pool. It includes the original text generated by ChatGPT (version chatgpt-3.5-turbo-0301) explaining its decision-making process, along with a three-level relevance score (e.g., 0,1,2) representing ChatGPT's final decision.

    Finally, the Test Set dataset contains human annotations for a random selection of 250 question pairs used in hypothesis testing. It includes each of the three human annotators' final decisions, recorded as a three-level relevance score (e.g., 0,1,2).

    The file "ada_embedding_for_DORIS-MAE_v1.pickle" contains text embeddings for the DORIS-MAE dataset, generated by OpenAI's ada-002 model. The structure of the file is as follows:

    ├── ada_embedding_for_DORIS-MAE_v1.pickle
    ├── "Query"
    │ ├── query_id_1 (Embedding of query_1)
    │ ├── query_id_2 (Embedding of query_2)
    │ └── query_id_3 (Embedding of query_3)
    │ .
    │ .
    │ .
    └── "Corpus"
    ├── corpus_id_1 (Embedding of abstract_1)
    ├── corpus_id_2 (Embedding of abstract_2)
    └── corpus_id_3 (Embedding of abstract_3)
    .
    .
    .

  4. P

    CoIR Dataset

    • paperswithcode.com
    Updated Nov 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiangyang Li; Kuicai Dong; Yi Quan Lee; Wei Xia; Hao Zhang; Xinyi Dai; Yasheng Wang; Ruiming Tang (2024). CoIR Dataset [Dataset]. https://paperswithcode.com/dataset/coir
    Explore at:
    Dataset updated
    Nov 30, 2024
    Authors
    Xiangyang Li; Kuicai Dong; Yi Quan Lee; Wei Xia; Hao Zhang; Xinyi Dai; Yasheng Wang; Ruiming Tang
    Description

    CoIR (Code Information Retrieval) benchmark, is designed to evaluate code retrieval capabilities. CoIR includes 10 curated code datasets, covering 8 retrieval tasks across 7 domains. In total, it encompasses two million documents. It also provides a common and easy Python framework, installable via pip, and shares the same data schema as benchmarks like MTEB and BEIR for easy cross-benchmark evaluations.

  5. h

    germanquad-retrieval-qrels

    • huggingface.co
    Updated Jan 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2024). germanquad-retrieval-qrels [Dataset]. https://huggingface.co/datasets/mteb/germanquad-retrieval-qrels
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 15, 2024
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is derived from the GermanQuAD dataset. This dataset takes the testset and represents it as qrels in the BEIR information retrieval benchmark format. Corpus and query ids have been added. The corresponding corpus can be found here. Full credit for the original dataset goes to the authors of the GermanQuAD dataset. The original dataset is licensed under CC BY-SA 4.0. Citation for the original dataset: @misc{möller2021germanquad, title={GermanQuAD and GermanDPR: Improving… See the full description on the dataset page: https://huggingface.co/datasets/mteb/germanquad-retrieval-qrels.

  6. P

    ReQA Dataset

    • paperswithcode.com
    Updated Nov 15, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amin Ahmad; Noah Constant; Yinfei Yang; Daniel Cer (2021). ReQA Dataset [Dataset]. https://paperswithcode.com/dataset/reqa
    Explore at:
    Dataset updated
    Nov 15, 2021
    Authors
    Amin Ahmad; Noah Constant; Yinfei Yang; Daniel Cer
    Description

    Retrieval Question-Answering (ReQA) benchmark tests a model’s ability to retrieve relevant answers efficiently from a large set of documents.

  7. h

    tax-retrieval-benchmark

    • huggingface.co
    Updated Oct 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louis Brulé Naudet (2023). tax-retrieval-benchmark [Dataset]. https://huggingface.co/datasets/louisbrulenaudet/tax-retrieval-benchmark
    Explore at:
    Dataset updated
    Oct 17, 2023
    Authors
    Louis Brulé Naudet
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    French Taxation Embedding Benchmark (retrieval)

    This dataset is designed for the task of retrieving relevant tax articles or content based on queries in the French language. It can be used for benchmarking information retrieval systems, particularly in the legal and financial domains.

      Massive Text Embedding Benchmark for French Taxation
    

    In this notebook, we will explore the process of adding a new task to the Massive Text Embedding Benchmark (MTEB). The MTEB is an… See the full description on the dataset page: https://huggingface.co/datasets/louisbrulenaudet/tax-retrieval-benchmark.

  8. t

    Legal CaPER Benchmark

    • researchdata.tuwien.at
    json, tsv, txt
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tobias Fink; Tobias Fink; Tobias Fink; Tobias Fink (2024). Legal CaPER Benchmark [Dataset]. http://doi.org/10.48436/5caar-3r468
    Explore at:
    txt, tsv, jsonAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    TU Wien
    Authors
    Tobias Fink; Tobias Fink; Tobias Fink; Tobias Fink
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Legal Case Passage Extraction and Retrieval benchmark is an information retrieval benchmark collection for court case passage retrieval. Specifically, it is a collection for evaluating Cited Case Passage Retrieval (CCPR) and contains case passages from the Austrian building regulations domain (Source: RIS). The following files are included in the dataset:

    • full_collection.tsv

    A tab separated file containing the passage texts of court cases from the building regulations domain. Column 1 contains the ID of the passage, Column 2 contains the passage text and Column 3 contains the case ID (Geschäftszahl) of the origin case of the passage.

    • queries.tsv

    A tab separated file containing the queries / topics for which relevance assessments exist in this collection. Column 1 contains the ID of the query, Column 2 contains the query passage text and Column 3 contains the case ID (Geschäftszahl) of the cited case. For the task of CCPR, it is intended that results are additionally filtered based on exact matches of the case ID. For each query, only relevance assessments exist for passages that match the case ID of column 3.

    • qrel.json

    Contains relevance assessments for each query. In this dictionary, a passage from the full collection is relevant for a query if qrel[

    • qrel.json.txt

    A conversion of the qrel.json file to be compatible with trec eval.

  9. Data from: #nowplaying-RS: A New Benchmark Dataset for Building...

    • zenodo.org
    • data.niaid.nih.gov
    pdf, zip
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asmita Poddar; Eva Zangerle; Yi-Hsuan Yang; Asmita Poddar; Eva Zangerle; Yi-Hsuan Yang (2024). #nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music Recommender Systems [Dataset]. http://doi.org/10.5281/zenodo.3242238
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Asmita Poddar; Eva Zangerle; Yi-Hsuan Yang; Asmita Poddar; Eva Zangerle; Yi-Hsuan Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Music recommender systems can offer users personalized and contextualized recommendation and are therefore important for music information retrieval. An increasing number of datasets have been compiled to facilitate research on different topics, such as content-based, context-based or next-song recommendation. However, these topics are usually addressed separately using different datasets, due to the lack of a unified dataset that contains a large variety of feature types such as item features, user contexts, and timestamps. To address this issue, we propose a large-scale benchmark dataset called #nowplaying-RS, which contains 11.6 million music listening events (LEs) of 139K users and 346K tracks collected from Twitter. The dataset comes with a rich set of item content features and user context features, and the timestamps of the LEs. Moreover, some of the user context features imply the cultural origin of the users, and some others—like hashtags—give clues to the emotional state of a user underlying an LE. In this paper, we provide some statistics to give insight into the dataset, and some directions in which the dataset can be used for making music recommendation. We also provide standardized training and test sets for experimentation, and some baseline results obtained by using factorization machines.

    The dataset contains three files:

    • user_track_hashtag_timestamp.csv contains basic information about each listening event. For each listening event, we provide an id, the user_id, track_id, hashtag, created_at
    • context_content_features.csv: contains all context and content features. For each listening event, we provide the id of the event, user_id, track_id, artist_id, content features regarding the track mentioned in the event (instrumentalness, liveness, speechiness, danceability, valence, loudness, tempo, acousticness, energy, mode, key) and context features regarding the listening event (coordinates (as geoJSON), place (as geoJSON), geo (as geoJSON), tweet_language, created_at, user_lang, time_zone, entities contained in the tweet).
    • sentiment_values.csv contains sentiment information for hashtags. It contains the hashtag itself and the sentiment values gathered via four different sentiment dictionaries: AFINN, Opinion Lexicon, Sentistrength Lexicon and vader. For each of these dictionaries we list the minimum, maximum, sum and average of all sentiments of the tokens of the hashtag (if available, else we list empty values). However, as most hashtags only consist of a single token, these values are equal in most cases. Please note that the lexica are rather diverse and therefore, are able to resolve very different terms against a score. Hence, the resulting csv is rather sparse. The file contains the following comma-separated values:

    Please also find the training and test-splits for the dataset in this repo. Also, prototypical implementations of a context-aware recommender system based on the dataset can be found at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM.

    If you make use of this dataset, please cite the following paper where we describe and experiment with the dataset:

    @inproceedings{smc18,
    title = {#nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music Recommender Systems},
    author = {Asmita Poddar and Eva Zangerle and Yi-Hsuan Yang},
    url = {http://mac.citi.sinica.edu.tw/~yang/pub/poddar18smc.pdf},
    year = {2018},
    date = {2018-07-04},
    booktitle = {Proceedings of the 15th Sound & Music Computing Conference},
    address = {Limassol, Cyprus},
    note = {code at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM},
    tppubtype = {inproceedings}
    }

  10. h

    trec-news-generated-queries

    • huggingface.co
    Updated Aug 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BEIR (2022). trec-news-generated-queries [Dataset]. https://huggingface.co/datasets/BeIR/trec-news-generated-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2022
    Dataset authored and provided by
    BEIR
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for BEIR Benchmark

      Dataset Summary
    

    BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

    Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/trec-news-generated-queries.

  11. O

    BEIR (Benchmarking IR)

    • opendatalab.com
    zip
    Updated Sep 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technical University of Darmstadt (2022). BEIR (Benchmarking IR) [Dataset]. https://opendatalab.com/OpenDataLab/BEIR
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 29, 2022
    Dataset provided by
    Technical University of Darmstadt
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    BEIR (Benchmarking IR) is an heterogeneous benchmark containing different information retrieval (IR) tasks. Through BEIR, it is possible to systematically study the zero-shot generalization capabilities of multiple neural retrieval approaches. The benchmark contains a total of 9 information retrieval tasks (Fact Checking, Citation Prediction, Duplicate Question Retrieval, Argument Retrieval, News Retrieval, Question Answering, Tweet Retrieval, Biomedical IR, Entity Retrieval) from 17 different datasets: MS MARCO TREC-COVID NFCorpus BioASQ Natural Questions HotpotQA FiQA-2018 Signal-1M TREC-News ArguAna Touche 2020 CQADupStack Quora Question Pairs DBPedia SciDocs FEVER Climate-FEVER SciFact

  12. P

    NFCorpus Dataset

    • paperswithcode.com
    • opendatalab.com
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NFCorpus Dataset [Dataset]. https://paperswithcode.com/dataset/nfcorpus
    Explore at:
    Description

    NFCorpus is a full-text English retrieval data set for Medical Information Retrieval. It contains a total of 3,244 natural language queries (written in non-technical English, harvested from the NutritionFacts.org site) with 169,756 automatically extracted relevance judgments for 9,964 medical documents (written in a complex terminology-heavy language), mostly from PubMed.

  13. Data from: A Multi-domain Benchmark for Personalized Search Evaluation

    • zenodo.org
    zip
    Updated Jun 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elias Bassani; Elias Bassani; Pranav Kasela; Alessandro Raganato; Alessandro Raganato; Gabriella Pasi; Gabriella Pasi; Pranav Kasela (2022). A Multi-domain Benchmark for Personalized Search Evaluation [Dataset]. http://doi.org/10.5281/zenodo.6606557
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Elias Bassani; Elias Bassani; Pranav Kasela; Alessandro Raganato; Alessandro Raganato; Gabriella Pasi; Gabriella Pasi; Pranav Kasela
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We provide large-scale multi-domain benchmark datasets for Personalized Search.

    Further information can be found here.

  14. Data from: List.MID: A MIDI-Based Benchmark for Evaluating RDF Lists

    • figshare.com
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jul 2, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Albert Meroño-Peñuela; Enrico Daga (2019). List.MID: A MIDI-Based Benchmark for Evaluating RDF Lists [Dataset]. http://doi.org/10.6084/m9.figshare.8426912.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 2, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Albert Meroño-Peñuela; Enrico Daga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Linked lists represent a countable number of ordered values, and are among the most important abstract data types in computer science. With the advent of RDF as a highly expressive knowledge representation language for the Web, various implementations for RDF lists have been proposed. Yet, there is no benchmark so far dedicated to evaluate the performance of triple stores and SPARQL query engines on dealing with ordered linked data. Moreover, essential tasks for evaluating RDF lists, like generating datasets containing RDF lists of various sizes, or generating the same RDF list using different modelling choices, are cumbersome and unprincipled. In this paper, we propose List.MID, a systematic benchmark for evaluating systems serving RDF lists. List.MID consists of a dataset generator, which creates RDF list data in various models and of different sizes; and a set of SPARQL queries. The RDF list data is coherently generated from a large, community-curated base collection of Web MIDI files, rich in lists of musical events of arbitrary length. We describe the List.MID benchmark, and discuss its impact and adoption, reusability, design, and availability.

  15. P

    CLIRMatrix Dataset

    • paperswithcode.com
    • opendatalab.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuo Sun; Kevin Duh, CLIRMatrix Dataset [Dataset]. https://paperswithcode.com/dataset/clirmatrix
    Explore at:
    Authors
    Shuo Sun; Kevin Duh
    Description

    CLIRMatrix is a large collection of bilingual and multilingual datasets for Cross-Lingual Information Retrieval. It includes:

    BI-139: A bilingual dataset of queries in one language matched with relevant documents in another language for 139x138=19,182 language pairs, MULTI-8, a multilingual dataset of queries and documents jointly aligned in 8 different languages.

    In total, 49 million unique queries and 34 billion (query, document, label) triplets were mined, making CLIRMatrix the largest and most comprehensive CLIR dataset to date.

  16. P

    Data from: Multi-EuP: The Multilingual European Parliament Dataset for...

    • paperswithcode.com
    Updated Nov 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jinrui Yang; Timothy Baldwin; Trevor Cohn (2023). Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval Dataset [Dataset]. https://paperswithcode.com/dataset/multi-eup
    Explore at:
    Dataset updated
    Nov 2, 2023
    Authors
    Jinrui Yang; Timothy Baldwin; Trevor Cohn
    Description

    The Multi-Eup is a new multilingual benchmark dataset, comprising 22K multilingual documents collected from the European Parliament, spanning 24 languages. This dataset is designed to investigate fairness in a multilingual information retrieval (IR) context to analyze both language and demographic bias in a ranking context. It boasts an authentic multilingual corpus, featuring topics translated into all 24 languages, as well as cross-lingual relevance judgments. Furthermore, it offers rich demographic information associated with its documents, facilitating the study of demographic bias.

  17. h

    arguana-qrels

    • huggingface.co
    • opendatalab.com
    Updated Aug 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BEIR (2022). arguana-qrels [Dataset]. https://huggingface.co/datasets/BeIR/arguana-qrels
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 15, 2022
    Dataset authored and provided by
    BEIR
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for BEIR Benchmark

      Dataset Summary
    

    BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

    Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/arguana-qrels.

  18. SE-PQA: a Resource for Personalized Community Question Answering

    • zenodo.org
    csv, zip
    Updated Feb 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kasela Pranav; Kasela Pranav; Pasi Gabriella; Pasi Gabriella; Perego Raffaele; Perego Raffaele; Marco Braga; Marco Braga (2024). SE-PQA: a Resource for Personalized Community Question Answering [Dataset]. http://doi.org/10.5281/zenodo.7940964
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Feb 5, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kasela Pranav; Kasela Pranav; Pasi Gabriella; Pasi Gabriella; Perego Raffaele; Perego Raffaele; Marco Braga; Marco Braga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Personalization in Information Retrieval is a topic studied for a long time. Nevertheless, there is still a lack of high-quality, real-world datasets to conduct large-scale experiments and evaluate models for personalized search. This paper contributes to fill this gap by introducing SE-PQA (StackExchange - Personalized Question Answering), a new resource to design and evaluate personalized models related to the two tasks of community Question Answering (cQA). The contributed dataset includes more than 1 million queries and 2 million answers, annotated with a rich set of features modeling the social interactions among the users of a popular cQA platform. We describe the characteristics of SE-PQA and detail the features associated with both questions and answers. We also provide reproducible baseline methods for the cQA task based on the resource, including deep learning models and personalization approaches. The results of the preliminary experiments conducted show the appropriateness of SE-PQA to train effective cQA models; they also show that personalization improves remarkably the effectiveness of all the methods tested. Furthermore, we show the benefits in terms of robustness and generalization of combining data from multiple communities for personalization purposes.

    Performance on all communities separately:

      <tbody><tr>
        <th>Community</th>
        <th>Model (BM25 +)</th>
        <th>P@1</th>
        <th>NDCG@3</th>
        <th>NDCG@10</th>
        <th>R@100</th>
        <th>MAP@100</th>
        <th>$\lambda$</th>
      </tr>
    
    </tbody><tbody>
      <tr>
        <td>Academia</td>
        <td>MiniLM</td>
        <td>0.438</td>
        <td>0.382</td>
        <td>0.395</td>
        <td>0.489</td>
        <td>0.344</td>
        <td>(.1,.9)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.453</td>
        <td>0.392</td>
        <td>0.403</td>
        <td>0.489</td>
        <td>0.352</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>Anime</td>
        <td>MiniLM + TAG</td>
        <td>0.650</td>
        <td>0.682</td>
        <td>0.714</td>
        <td>0.856</td>
        <td>0.683</td>
        <td>(.1,.9,.0)</td>
      </tr>
      <tr>
        <td>Apple</td>
        <td>MiniLM</td>
        <td>0.327</td>
        <td>0.351</td>
        <td>0.381</td>
        <td>0.514</td>
        <td>0.349</td>
        <td>(.1,.9)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.335</td>
        <td>0.361</td>
        <td>0.389</td>
        <td>0.514</td>
        <td>0.357</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>Bicycles</td>
        <td>MiniLM</td>
        <td>0.405</td>
        <td>0.380</td>
        <td>0.421</td>
        <td>0.600</td>
        <td>0.365</td>
        <td>(.1,.9)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.436</td>
        <td>0.405</td>
        <td>0.441</td>
        <td>0.600</td>
        <td>0.386</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>Boardgames</td>
        <td>MiniLM</td>
        <td>0.681</td>
        <td>0.694</td>
        <td>0.728</td>
        <td>0.866</td>
        <td>0.692</td>
        <td>(.1,.9)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.696</td>
        <td>0.702</td>
        <td>0.736</td>
        <td>0.866</td>
        <td>0.699</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>Buddhism</td>
        <td>MiniLM + TAG</td>
        <td>0.490</td>
        <td>0.387</td>
        <td>0.397</td>
        <td>0.544</td>
        <td>0.334</td>
        <td>(.3,.7,.0)</td>
      </tr>
      <tr>
        <td>Christianity</td>
        <td>MiniLM</td>
        <td>0.534</td>
        <td>0.505</td>
        <td>0.555</td>
        <td>0.783</td>
        <td>0.497</td>
        <td>(.2,.8)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.549</td>
        <td>0.521</td>
        <td>0.564</td>
        <td>0.783</td>
        <td>0.507</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>Cooking</td>
        <td>MiniLM</td>
        <td>0.600</td>
        <td>0.567</td>
        <td>0.600</td>
        <td>0.719</td>
        <td>0.553</td>
        <td>(.1,.9)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.619</td>
        <td>0.583</td>
        <td>0.614</td>
        <td>0.719</td>
        <td>0.568</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>DIY</td>
        <td>MiniLM</td>
        <td>0.323</td>
        <td>0.313</td>
        <td>0.346</td>
        <td>0.501</td>
        <td>0.302</td>
        <td>(.1,.9)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.335</td>
        <td>0.324</td>
        <td>0.356</td>
        <td>0.501</td>
        <td>0.312</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>Expatriates</td>
        <td>MiniLM + TAG</td>
        <td>0.596</td>
        <td>0.653</td>
        <td>0.682</td>
        <td>0.832</td>
        <td>0.645</td>
        <td>(.1,.9,.0)</td>
      </tr>
      <tr>
        <td>Fitness</td>
        <td>MiniLM + TAG</td>
        <td>0.568</td>
        <td>0.575</td>
        <td>0.613</td>
        <td>0.760</td>
        <td>0.567</td>
        <td>(.2,.8,.0)</td>
      </tr>
      <tr>
        <td>Freelancing</td>
        <td>MiniLM + TAG</td>
        <td>0.513</td>
        <td>0.472</td>
        <td>0.506</td>
        <td>0.654</td>
        <td>0.457</td>
        <td>(.1,.9,.0)</td>
      </tr>
      <tr>
        <td>Gaming</td>
        <td>MiniLM</td>
        <td>0.510</td>
        <td>0.534</td>
        <td>0.562</td>
        <td>0.686</td>
        <td>0.532</td>
        <td>(.1,.9)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.519</td>
        <td>0.547</td>
        <td>0.571</td>
        <td>0.686</td>
        <td>0.541</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>Gardening</td>
        <td>MiniLM</td>
        <td>0.344</td>
        <td>0.362</td>
        <td>0.396</td>
        <td>0.520</td>
        <td>0.359</td>
        <td>(.1,.9)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.345</td>
        <td>0.369</td>
        <td>0.399</td>
        <td>0.520</td>
        <td>0.363</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>Genealogy</td>
        <td>MiniLM + TAG</td>
        <td>0.592</td>
        <td>0.605</td>
        <td>0.631</td>
        <td>0.779</td>
        <td>0.594</td>
        <td>(.3,.7,.0)</td>
      </tr>
      <tr>
        <td>Health</td>
        <td>MiniLM + TAG</td>
        <td>0.718</td>
        <td>0.765</td>
        <td>0.797</td>
        <td>0.934</td>
        <td>0.765</td>
        <td>(.2,.8,.0)</td>
      </tr>
      <tr>
        <td>Gaming</td>
        <td>MiniLM</td>
        <td>0.510</td>
        <td>0.534</td>
        <td>0.562</td>
        <td>0.686</td>
        <td>0.532</td>
        <td>(.1,.9)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.519</td>
        <td>0.547</td>
        <td>0.571</td>
        <td>0.686</td>
        <td>0.541</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>Hermeneutics</td>
        <td>MiniLM</td>
        <td>0.589</td>
        <td>0.538</td>
        <td>0.593</td>
        <td>0.828</td>
        <td>0.526</td>
        <td>(.2,.8)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.632</td>
        <td>0.570</td>
        <td>0.617</td>
        <td>0.828</td>
        <td>0.552</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>Hinduism</td>
        <td>MiniLM</td>
        <td>0.388</td>
        <td>0.415</td>
        <td>0.459</td>
        <td>0.686</td>
        <td>0.416</td>
        <td>(.2,.8)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.382</td>
        <td>0.410</td>
        <td>0.457</td>
        <td>0.686</td>
        <td>0.412</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>History</td>
        <td>MiniLM + TAG</td>
        <td>0.740</td>
        <td>0.735</td>
        <td>0.764</td>
        <td>0.862</td>
        <td>0.730</td>
        <td>(.2,.8,.0)</td>
      </tr>
      <tr>
        <td>Hsm</td>
        <td>MiniLM + TAG</td>
        <td>0.666</td>
        <td>0.707</td>
        <td>0.737</td>
        <td>0.870</td>
        <td>0.690</td>
        <td>(.2,.8,.0)</td>
      </tr>
      <tr>
        <td>Interpersonal</td>
        <td>MiniLM + TAG</td>
        <td>0.663</td>
        <td>0.617</td>
        <td>0.653</td>
        <td>0.739</td>
        <td>0.604</td>
        <td>(.2,.8,.0)</td>
      </tr>
      <tr>
        <td>Islam</td>
        <td>MiniLM</td>
        <td>0.382</td>
        <td>0.412</td>
        <td>0.453</td>
        <td>0.642</td>
        <td>0.410</td>
        <td>(.1,.9)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.395</td>
        <td>0.427</td>
        <td>0.464</td>
        <td>0.642</td>
        <td>0.421</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>Judaism</td>
        <td>MiniLM + TAG</td>
        <td>0.363</td>
        <td>0.387</td>
        <td>0.432</td>
        <td>0.649</td>
        <td>0.388</td>
        <td>(.2,.8,.0)</td>
      </tr>
      <tr>
        <td>Law</td>
        <td>MiniLM</td>
        <td>0.663</td>
        <td>0.647</td>
        <td>0.678</td>
        <td>0.803</td>
        <td>0.639</td>
        <td>(.2,.8)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.677</td>
        <td>0.657</td>
        <td>0.687</td>
        <td>0.803</td>
        <td>0.649</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>Lifehacks</td>
        <td>MiniLM</td>
        <td>0.714</td>
        <td>0.601</td>
        <td>0.617</td>
        <td>0.703</td>
        <td>0.553</td>
        <td>(.1,.9)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.714</td>
        <td>0.621</td>
        <td>0.631</td>
        <td>0.703</td>
        <td>0.568</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>Linguistics</td>
        <td>MiniLM + TAG</td>
        <td>0.584</td>
        <td>0.588</td>
        <td>0.630</td>
        <td>0.794</td>
        <td>0.587</td>
        <td>(.2,.8,.0)</td>
      </tr>
      <tr>
        <td>Literature</td>
        <td>MiniLM + TAG</td>
        <td>0.871</td>
        <td>0.878</td>
        <td>0.889</td>
        <td>0.934</td>
        <td>0.876</td>
        <td>(.3,.7,.0)</td>
      </tr>
      <tr>
        <td>Martialarts</td>
        <td>MiniLM</td>
        <td>0.630</td>
        <td>0.599</td>
        <td>0.645</td>
        <td>0.796</td>
        <td>0.596</td>
        <td>(.1,.9)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.640</td>
        <td>0.628</td>
        <td>0.660</td>
        <td>0.796</td>
        <td>0.612</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>Money</td>
        <td>MiniLM</td>
        <td>0.545</td>
        <td>0.535</td>
        <td>0.563</td>
        <td>0.706</td>
        <td>0.515</td>
        <td>(.2,.8)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.559</td>
        <td>0.542</td>
        <td>0.571</td>
        <td>0.706</td>
        <td>0.523</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>Movies</td>
        <td>MiniLM</td>
        <td>0.713</td>
        <td>0.722</td>
        <td>0.753</td>
        <td>0.865</td>
        <td>0.724</td>
        <td>(.1,.9)</td>
      </tr>
      <tr>
        <td> </td>
        <td>MiniLM + TAG</td>
        <td>0.728</td>
        <td>0.735</td>
        <td>0.762</td>
        <td>0.865</td>
        <td>0.735</td>
        <td>(.1,.8,.1)</td>
      </tr>
      <tr>
        <td>Music</td>
        <td>MiniLM</td>
        <td>0.508</td>
        <td>0.447</td>
        <td>0.476</td>
        <td>0.602</td>
        <td>0.418</td>
        <td>(.2,.8)</td>
      </tr>
      <tr>
        <td>
    
  19. f

    List of the image queries

    • figshare.com
    zip
    Updated Mar 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yiltan Bitirim (2022). List of the image queries [Dataset]. http://doi.org/10.6084/m9.figshare.12336275.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 17, 2022
    Dataset provided by
    figshare
    Authors
    Yiltan Bitirim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The image queries were used in the following studies:* Y. Bitirim, S. Bitirim, D. Ç. Ertuğrul and Ö. Toygar, “An Evaluation of Reverse Image Search Performance of Google”, 2020 IEEE 44th Annual Computer Software and Applications Conference (COMPSAC), pp. 1368-1372, IEEE, Madrid, Spain, July 2020. (DOI: 10.1109/COMPSAC48688.2020.00-65)** Y. Bitirim, “Retrieval Effectiveness of Google on Reverse Image Search”, Journal of Imaging Science and Technology, Vol. 66, No. 1, pp. 010505-1-010505-6, January 2022. (DOI: 10.2352/J.ImagingSci.Technol.2022.66.1.010505)

  20. h

    scifact

    • huggingface.co
    Updated Aug 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BEIR (2023). scifact [Dataset]. https://huggingface.co/datasets/BeIR/scifact
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 16, 2023
    Dataset authored and provided by
    BEIR
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for BEIR Benchmark

      Dataset Summary
    

    BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

    Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/scifact.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nandan Thakur; Nils Reimers; Andreas Rücklé; Abhishek Srivastava; Iryna Gurevych, BEIR Dataset [Dataset]. https://paperswithcode.com/dataset/beir

BEIR Dataset

Benchmarking IR

Explore at:
Authors
Nandan Thakur; Nils Reimers; Andreas Rücklé; Abhishek Srivastava; Iryna Gurevych
Description

BEIR (Benchmarking IR) is a heterogeneous benchmark containing different information retrieval (IR) tasks. Through BEIR, it is possible to systematically study the zero-shot generalization capabilities of multiple neural retrieval approaches.

The benchmark contains a total of 9 information retrieval tasks (Fact Checking, Citation Prediction, Duplicate Question Retrieval, Argument Retrieval, News Retrieval, Question Answering, Tweet Retrieval, Biomedical IR, Entity Retrieval) from 19 different datasets:

MS MARCO TREC-COVID NFCorpus BioASQ Natural Questions HotpotQA FiQA-2018 Signal-1M TREC-News ArguAna Touche 2020 CQADupStack Quora Question Pairs DBPedia SciDocs FEVER Climate-FEVER SciFact Robust04

Search
Clear search
Close search
Google apps
Main menu