COVq dataset
This dataset was used in the paper GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning. Refer to https://arxiv.org/abs/2402.16829 for details. The code for generating the data is available at https://github.com/avsolatorio/GISTEmbed.
Citation
@article{solatorio2024gistembed, title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning}, author={Aivin V. Solatorio}… See the full description on the dataset page: https://huggingface.co/datasets/avsolatorio/covid-bing-query-gpt4-avs_triplets.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Bing x GPT-4 Synthetic Query Dataset
This dataset was used in the paper GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning. Refer to https://arxiv.org/abs/2402.16829 for details. The code for generating the data is available at https://github.com/avsolatorio/GISTEmbed.
Citation
@article{solatorio2024gistembed, title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning}… See the full description on the dataset page: https://huggingface.co/datasets/avsolatorio/covid-bing-query-gpt4.
MEDI+MTEBcls+COVq dataset
This dataset was used in the paper GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning. Refer to https://arxiv.org/abs/2402.16829 for details. The code for generating the data is available at https://github.com/avsolatorio/GISTEmbed.
Citation
@article{solatorio2024gistembed, title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning}, author={Aivin V. Solatorio}… See the full description on the dataset page: https://huggingface.co/datasets/avsolatorio/medi-data-mteb-covid-bing-query-gpt4-avs_triplets.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
COVq dataset
This dataset was used in the paper GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning. Refer to https://arxiv.org/abs/2402.16829 for details. The code for generating the data is available at https://github.com/avsolatorio/GISTEmbed.
Citation
@article{solatorio2024gistembed, title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning}, author={Aivin V. Solatorio}… See the full description on the dataset page: https://huggingface.co/datasets/avsolatorio/covid-bing-query-gpt4-avs_triplets.