Facebook
TwitterDataset Overview
dataset: gist-960-euclidean
Metadata
Creation Time: 2025-01-07 11:03:48+0000 Update Time: 2025-01-07 11:04:44+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:
This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/gist-960-euclidean.
Facebook
TwitterDataset Overview
dataset: glove-100-angular
Metadata
Creation Time: 2025-01-07 11:21:16+0000 Update Time: 2025-01-07 11:21:31+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:
This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of the… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/glove-100-angular.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
PubMed & arXiv Abstract Embeddings for Vector Database Benchmarking
Dataset Description
This dataset contains pre-computed embeddings of scientific paper abstracts from PubMed and arXiv, designed for evaluating vector database performance. The embeddings are generated using Google's EmbeddingGemma-300M model.
Purpose
Benchmark dataset for evaluating vector database performance, specifically designed for use with VectorDBBench.
Dataset Summary
Total… See the full description on the dataset page: https://huggingface.co/datasets/cryptolab-playground/pubmed-arxiv-abstract-embedding-gemma-300m.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vector data base of the Cadastre section — topograhie of the Directorate of Focal Affairs (DAF) of Polynesia-Française. The vector database covers the following topics: — benchmarks (REF): geodetic, levelling, stereo-preparation, geodetic and projection systems. — Location (LOC): islands, administrative boundaries — Edification (EDI): buildings, surface constructions, linear constructions. — Hydrography (HYD): rivers, hydrographic surfaces, lagoons. — Land use (SOL): natural or exploited vegetated areas. — Orography-Relief (REL): level curves, slopes, side points. — Road network (VOI): roads and edges, road furniture. — Toponymy (NOM): points of interest, oronyms, hydronyms, places mentioned
Facebook
TwitterDataset Overview
dataset: nytimes-256-angular
Metadata
Creation Time: 2025-01-07 11:44:42+0000 Update Time: 2025-01-07 11:44:49+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:
This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/nytimes-256-angular.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Bloomberg Financial News Embeddings for Vector Database Benchmarking
Dataset Description
This dataset contains pre-computed embeddings of Bloomberg financial news articles, designed for evaluating vector database performance. The embeddings are generated using Google's EmbeddingGemma-300M model.
Purpose
Benchmark dataset for evaluating vector database performance on financial news domain, specifically designed for use with VectorDBBench.
Dataset Summary… See the full description on the dataset page: https://huggingface.co/datasets/cryptolab-playground/Bloomberg-Financial-News-embedding-gemma-300m.
Facebook
TwitterDataset Overview
dataset: random-xs-20-angular
Metadata
Creation Time: 2025-01-07 11:37:47+0000 Update Time: 2025-01-07 11:37:48+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:
This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/random-xs-20-angular.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Clinical diagnostic efficacy comparison of the proposed hybrid model and benchmark machine learning models. Metrics are reported as point estimates with 95% CIs. This table offers a discussion of the measures of high interest in clinical implementation. Sensitivity (true positive rate) reflects the ability to correctly identify patients with CVD, while Specificity (true negative rate) indicates the ability to correctly rule out patients without CVD. The Negative Likelihood Ratio quantifies how much the odds of having the disease decrease with a negative test result, where smaller values indicate stronger diagnostic power. The suggested model has the best balance of the greatest sensitivity and specificity, as well as the least Negative Likelihood Ratio, which highlights its strength and better performance as a diagnostic tool in the clinical practice.
Facebook
TwitterDataset Overview
dataset: lastfm-64-dot
Metadata
Creation Time: 2025-01-06 11:09:48+0000 Update Time: 2025-01-07 11:48:10+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:
This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of the… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/lastfm-64-dot.
Facebook
TwitterDataset Overview
dataset: fashion-mnist-784-euclidean
Metadata
Creation Time: 2025-01-07 11:02:55+0000 Update Time: 2025-01-07 11:03:01+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:
This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/fashion-mnist-784-euclidean.
Facebook
TwitterDataset Overview
dataset: nytimes-16-angular
Metadata
Creation Time: 2025-01-07 11:46:28+0000 Update Time: 2025-01-07 11:46:31+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:
This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/nytimes-16-angular.
Facebook
TwitterDataset Overview
dataset: glove-25-angular
Metadata
Creation Time: 2025-01-07 11:05:52+0000 Update Time: 2025-01-07 11:06:02+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:
This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of the… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/glove-25-angular.
Facebook
TwitterGAS Indexing Artifacts
Dataset Description
This dataset contains pre-computed deterministic centroids and associated geometric metadata generated using our GAS (Geometry-Aware Selection) algorithm. These artifacts are designed to benchmark Approximate Nearest Neighbor (ANN) search performance in privacy-preserving or dynamic vector database environments.
Purpose
To serve as a standardized benchmark resource for evaluating the efficiency and recall of vector… See the full description on the dataset page: https://huggingface.co/datasets/cryptolab-playground/gas-centroids.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterDataset Overview
dataset: gist-960-euclidean
Metadata
Creation Time: 2025-01-07 11:03:48+0000 Update Time: 2025-01-07 11:04:44+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:
This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/gist-960-euclidean.