https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for STSb Multi MT
Dataset Summary
STS Benchmark comprises a selection of the English datasets used in the STS tasks organized in the context of SemEval between 2012 and 2017. The selection of datasets include text from image captions, news headlines and user forums. (source)
These are different multilingual translations and the English original of the STSbenchmark dataset. Translation has been done with deepl.com. It can be used to train sentence embeddings… See the full description on the dataset page: https://huggingface.co/datasets/PhilipMay/stsb_multi_mt.
Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
License information was derived automatically
We introduce GiCCS, a first conversational STS evaluation benchmark for German. We collected the similarity annotations for GiCCS using best-worst scaling and presenting the target items in context, in order to obtain highly-reliable context-dependent similarity scores. In our paper, we present benchmarking experiments for evaluating LMs on capturing the similarity of utterances. Results suggest that pretraining LMs on conversational data and providing conversational context can be useful for capturing similarity of utterances in dialogues. GiCCS will be publicly available to encourage benchmarking of conversational LMs.
STS Benchmark comprises a selection of the English datasets used in the STS tasks organized in the context of SemEval between 2012 and 2017. The selection of datasets include text from image captions, news headlines and user forums.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The RO-STS (Romanian Semantic Textual Similarity) dataset contains 8628 pairs of sentences with their similarity score. It is a high-quality translation of the STS benchmark dataset.
Dataset Card for STSB
The Semantic Textual Similarity Benchmark (Cer et al., 2017) is a collection of sentence pairs drawn from news headlines, video and image captions, and natural language inference data. Each pair is human-annotated with a similarity score from 1 to 5. However, for this variant, the similarity scores are normalized to between 0 and 1.
Dataset Details
Columns: "sentence1", "sentence2", "score" Column types: str, str, float Examples:{ 'sentence1': 'A… See the full description on the dataset page: https://huggingface.co/datasets/sentence-transformers/stsb.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for STS-ca
Dataset Summary
STS-ca corpus is a benchmark for evaluating Semantic Text Similarity in Catalan. This dataset was developed by BSC TeMU as part of Projecte AINA, to enrich the Catalan Language Understanding Benchmark (CLUB). This work is licensed under a Attribution-ShareAlike 4.0 International License.
Supported Tasks and Leaderboards
This dataset can be used to build and score semantic similarity models in Catalan.… See the full description on the dataset page: https://huggingface.co/datasets/projecte-aina/sts-ca.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
STS corpus is a benchmark for evaluating Semantic Text Similarity in Catalan.It consists of 3079 sentence pairs, annotated with the semantic similarity between them, using a scale from 0 (no similarity at all) to 5 (semantic equivalence). It is done manually by 4 different people following our guidelines based on previous work from the SemEval challenges (https://www.aclweb.org/anthology/S13-1004.pdf). This dataset was developed by BSC TeMU as part of the AINA project.
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
SICK-R An MTEB dataset Massive Text Embedding Benchmark
Semantic Textual Similarity SICK-R dataset
Task category t2t
Domains Web, Written
Reference https://aclanthology.org/L14-1314/
How to evaluate on this task
You can evaluate an embedding model on this dataset using the following code: import mteb
task = mteb.get_tasks(["SICK-R"]) evaluator = mteb.MTEB(task)
model = mteb.get_model(YOUR_MODEL) evaluator.run(model)
To learn more about how to run models… See the full description on the dataset page: https://huggingface.co/datasets/mteb/sickr-sts.
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
STS17 An MTEB dataset Massive Text Embedding Benchmark
Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation
Task category t2t
Domains News, Web, Written
Reference https://alt.qcri.org/semeval2017/task1/
How to evaluate on this task
You can evaluate an embedding model on this dataset using the following code: import mteb
task = mteb.get_tasks(["STS17"]) evaluator = mteb.MTEB(task)
model =… See the full description on the dataset page: https://huggingface.co/datasets/mteb/sts17-crosslingual-sts.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is a Nepali version of the Sentence Textual Similarity Benchmark (STS-B) derived from the STS-B Multi-MT corpus. It consists of sentence pairs annotated with similarity scores, indicating how semantically similar the two sentences are. The dataset serves as a valuable resource for developing and evaluating natural language processing (NLP) models focused on understanding and measuring sentence similarity in Nepali. Each sentence pair is assigned a similarity score ranging from 0 to 5, where 0 indicates no similarity and 5 indicates complete semantic equivalence. This dataset is crucial for various NLP applications, including machine translation, paraphrase detection, and semantic search, enabling the advancement of language technologies in the Nepali language.
GLUE, the General Language Understanding Evaluation benchmark (https://gluebenchmark.com/) is a collection of resources for training, evaluating, and analyzing natural language understanding systems.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('glue', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
appletea2333/ST-Align-Benchmark dataset hosted on Hugging Face and contributed by the HF Datasets community
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Model Weights
This repository provides weights of the models from the benchmarking study conducted in "https://arxiv.org/abs/2006.13365">"Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework" which have been upgraded to compatible with PyKEEN 1.9.
The weights are organized as zipfiles, which are named by the dataset-interaction function configuration. For each of these combinations, we chose the best according to validation Hits@10 to include into this repository. For each model, we have three files:
configuration.json
contains the (pipeline) configuration used to train the model. It can loaded as
import pathlib
import json
configuration = json.loads(pathlib.Path("configuration.json").read_text())
Since the configuration is intended for the pipeline, we need some custom code to re-create the model without re-training it.
from pykeen.datasets import get_dataset
from pykeen.models import ERModel, model_resolver
configuration = configuration["pipeline"]
# load the triples factory
dataset = get_dataset(
dataset=configuration["dataset"], dataset_kwargs=configuration.get("dataset_kwargs", None)
)
model: ERModel = model_resolver.make(
configuration["model"], configuration["model_kwargs"], triples_factory=dataset.training
)
Note, that this only creates the model instance, but does not load the weights, yet.
state_dict.pt
contains the weights, stored via torch.save
. They can be
loaded viaWe can load these weights into the model by using Module.load_state_dict
Note that we set strict=False
, since the exported weights do not contain regularizers' state,
while the re-instantiated models may have regularizers.
results.json
contains the results obtained by the original runs. It can be read byNote that some of the recently added metrics are not available in those results.
A subset of the Semantic Textual Similarity reference data (STS Benchmark).
This is a dataset of vertical benchmarks collected by the St. Johns River Water Management District (SJRWMD). Maintained by Survey staff; this layer should only be updated with their direct permission.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The RO-STS-Parallel (a Parallel Romanian English dataset - translation of the Semantic Textual Similarity) contains 17256 sentences in Romanian and English. It is a high-quality translation of the English STS benchmark dataset into Romanian.
Semantic Textual Similarity reference data (STS Benchmark).
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
BIOSSES An MTEB dataset Massive Text Embedding Benchmark
Biomedical Semantic Similarity Estimation.
Task category t2t
Domains Medical
Reference https://tabilab.cmpe.boun.edu.tr/BIOSSES/DataSet.html
How to evaluate on this task
You can evaluate an embedding model on this dataset using the following code: import mteb
task = mteb.get_tasks(["BIOSSES"]) evaluator = mteb.MTEB(task)
model = mteb.get_model(YOUR_MODEL) evaluator.run(model)
To learn more… See the full description on the dataset page: https://huggingface.co/datasets/mteb/biosses-sts.
Dataset Summary
FarSick STS is a Persian (Farsi) dataset designed for the Semantic Textual Similarity (STS) task. It is a part of the FaMTEB (Farsi Massive Text Embedding Benchmark). The dataset was developed by translating and adapting the English SICK (Sentences Involving Compositional Knowledge) dataset, and it features Persian sentence pairs annotated for their degree of semantic relatedness.
Language(s): Persian (Farsi)
Task(s): Semantic Textual Similarity (STS)
Source:… See the full description on the dataset page: https://huggingface.co/datasets/MCINext/farsick-sts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Control results for the four benchmark systems.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for STSb Multi MT
Dataset Summary
STS Benchmark comprises a selection of the English datasets used in the STS tasks organized in the context of SemEval between 2012 and 2017. The selection of datasets include text from image captions, news headlines and user forums. (source)
These are different multilingual translations and the English original of the STSbenchmark dataset. Translation has been done with deepl.com. It can be used to train sentence embeddings… See the full description on the dataset page: https://huggingface.co/datasets/PhilipMay/stsb_multi_mt.