4 datasets found
  1. P

    MTEB Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niklas Muennighoff; Nouamane Tazi; Loïc Magne; Nils Reimers, MTEB Dataset [Dataset]. https://paperswithcode.com/dataset/mteb
    Explore at:
    Authors
    Niklas Muennighoff; Nouamane Tazi; Loïc Magne; Nils Reimers
    Description

    MTEB is a benchmark that spans 8 embedding tasks covering a total of 56 datasets and 112 languages. The 8 task types are Bitext mining, Classification, Clustering, Pair Classification, Reranking, Retrieval, Semantic Textual Similarity and Summarisation. The 56 datasets contain varying text lengths and they are grouped into three categories: Sentence to sentence, Paragraph to paragraph, and Sentence to paragraph.

    Check the latest leaderboards at HuggingFace.

  2. h

    AlphaNLI

    • huggingface.co
    Updated Jun 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). AlphaNLI [Dataset]. https://huggingface.co/datasets/mteb/AlphaNLI
    Explore at:
    Dataset updated
    Jun 21, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    AlphaNLI An MTEB dataset Massive Text Embedding Benchmark

    Measuring the ability to retrieve the groundtruth answers to reasoning task queries on AlphaNLI.

    Task category t2t

    Domains Encyclopaedic, Written

    Reference https://leaderboard.allenai.org/anli/submissions/get-started

      How to evaluate on this task
    

    You can evaluate an embedding model on this dataset using the following code: import mteb

    task = mteb.get_task("AlphaNLI") evaluator =… See the full description on the dataset page: https://huggingface.co/datasets/mteb/AlphaNLI.

  3. h

    SIQA

    • huggingface.co
    Updated Jun 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). SIQA [Dataset]. https://huggingface.co/datasets/mteb/SIQA
    Explore at:
    Dataset updated
    Jun 21, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    SIQA An MTEB dataset Massive Text Embedding Benchmark

    Measuring the ability to retrieve the groundtruth answers to reasoning task queries on SIQA.

    Task category t2t

    Domains Encyclopaedic, Written

    Reference https://leaderboard.allenai.org/socialiqa/submissions/get-started

      How to evaluate on this task
    

    You can evaluate an embedding model on this dataset using the following code: import mteb

    task = mteb.get_task("SIQA") evaluator = mteb.MTEB([task])… See the full description on the dataset page: https://huggingface.co/datasets/mteb/SIQA.

  4. h

    CrosslingualMultiDomainsDataset

    • huggingface.co
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lei Shen (2024). CrosslingualMultiDomainsDataset [Dataset]. https://huggingface.co/datasets/maidalun1020/CrosslingualMultiDomainsDataset
    Explore at:
    Dataset updated
    Jan 3, 2024
    Authors
    Lei Shen
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Leadboard

    BCEmbedding: Bilingual and Crosslingual Embedding for RAG

    GitHub

    Click to Open Contents

    🌐 Bilingual and Crosslingual Superiority 💡 Key Features 🚀 Latest Updates 🍎 Model List 📖 Manual Installation Quick Start

    ⚙️ Evaluation Evaluate Semantic Representation by MTEB Evaluate RAG by LlamaIndex

    📈 Leaderboard Semantic Representation Evaluations in MTEB RAG Evaluations in LlamaIndex

    🛠 Youdao's BCEmbedding API 🧲 WeChat Group ✏️ Citation 🔐… See the full description on the dataset page: https://huggingface.co/datasets/maidalun1020/CrosslingualMultiDomainsDataset.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Niklas Muennighoff; Nouamane Tazi; Loïc Magne; Nils Reimers, MTEB Dataset [Dataset]. https://paperswithcode.com/dataset/mteb

MTEB Dataset

Massive Text Embedding Benchmark

Explore at:
Authors
Niklas Muennighoff; Nouamane Tazi; Loïc Magne; Nils Reimers
Description

MTEB is a benchmark that spans 8 embedding tasks covering a total of 56 datasets and 112 languages. The 8 task types are Bitext mining, Classification, Clustering, Pair Classification, Reranking, Retrieval, Semantic Textual Similarity and Summarisation. The 56 datasets contain varying text lengths and they are grouped into three categories: Sentence to sentence, Paragraph to paragraph, and Sentence to paragraph.

Check the latest leaderboards at HuggingFace.

Search
Clear search
Close search
Google apps
Main menu