13 datasets found

h
gist-960-euclidean
huggingface.co
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
open vector database (2025). gist-960-euclidean [Dataset]. https://huggingface.co/datasets/open-vdb/gist-960-euclidean
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 7, 2025
Dataset authored and provided by
open vector database
Description
Dataset Overview

dataset: gist-960-euclidean

Metadata

Creation Time: 2025-01-07 11:03:48+0000 Update Time: 2025-01-07 11:04:44+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/gist-960-euclidean.
h
glove-100-angular
huggingface.co
Updated Jan 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
open vector database (2025). glove-100-angular [Dataset]. https://huggingface.co/datasets/open-vdb/glove-100-angular
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 7, 2025
Dataset authored and provided by
open vector database
Description
Dataset Overview

dataset: glove-100-angular

Metadata

Creation Time: 2025-01-07 11:21:16+0000 Update Time: 2025-01-07 11:21:31+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of the… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/glove-100-angular.
pubmed-arxiv-abstract-embedding-gemma-300m
huggingface.co
Updated Nov 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CryptoLab Inc. (2025). pubmed-arxiv-abstract-embedding-gemma-300m [Dataset]. https://huggingface.co/datasets/cryptolab-playground/pubmed-arxiv-abstract-embedding-gemma-300m
Explore at:
Dataset updated
Nov 29, 2025
Dataset provided by
CRYPTOLAB INC.
Authors
CryptoLab Inc.
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
PubMed & arXiv Abstract Embeddings for Vector Database Benchmarking

Dataset Description

This dataset contains pre-computed embeddings of scientific paper abstracts from PubMed and arXiv, designed for evaluating vector database performance. The embeddings are generated using Google's EmbeddingGemma-300M model.

Purpose

Benchmark dataset for evaluating vector database performance, specifically designed for use with VectorDBBench.

Dataset Summary

Total… See the full description on the dataset page: https://huggingface.co/datasets/cryptolab-playground/pubmed-arxiv-abstract-embedding-gemma-300m.
g
Vector cartographic database | gimi9.com
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vector cartographic database | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_5ebddd1b7894ec1982180840/
Explore at:
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Vector data base of the Cadastre section — topograhie of the Directorate of Focal Affairs (DAF) of Polynesia-Française. The vector database covers the following topics: — benchmarks (REF): geodetic, levelling, stereo-preparation, geodetic and projection systems. — Location (LOC): islands, administrative boundaries — Edification (EDI): buildings, surface constructions, linear constructions. — Hydrography (HYD): rivers, hydrographic surfaces, lagoons. — Land use (SOL): natural or exploited vegetated areas. — Orography-Relief (REL): level curves, slopes, side points. — Road network (VOI): roads and edges, road furniture. — Toponymy (NOM): points of interest, oronyms, hydronyms, places mentioned
h
nytimes-256-angular
huggingface.co
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
open vector database (2025). nytimes-256-angular [Dataset]. https://huggingface.co/datasets/open-vdb/nytimes-256-angular
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 7, 2025
Dataset authored and provided by
open vector database
Description
Dataset Overview

dataset: nytimes-256-angular

Metadata

Creation Time: 2025-01-07 11:44:42+0000 Update Time: 2025-01-07 11:44:49+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/nytimes-256-angular.
Bloomberg-Financial-News-embedding-gemma-300m
huggingface.co
Updated Nov 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CryptoLab Inc. (2025). Bloomberg-Financial-News-embedding-gemma-300m [Dataset]. https://huggingface.co/datasets/cryptolab-playground/Bloomberg-Financial-News-embedding-gemma-300m
Explore at:
Dataset updated
Nov 29, 2025
Dataset provided by
CRYPTOLAB INC.
Authors
CryptoLab Inc.
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Bloomberg Financial News Embeddings for Vector Database Benchmarking

Dataset Description

This dataset contains pre-computed embeddings of Bloomberg financial news articles, designed for evaluating vector database performance. The embeddings are generated using Google's EmbeddingGemma-300M model.

Purpose

Benchmark dataset for evaluating vector database performance on financial news domain, specifically designed for use with VectorDBBench.

Dataset Summary… See the full description on the dataset page: https://huggingface.co/datasets/cryptolab-playground/Bloomberg-Financial-News-embedding-gemma-300m.
h
random-xs-20-angular
huggingface.co
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
open vector database (2025). random-xs-20-angular [Dataset]. https://huggingface.co/datasets/open-vdb/random-xs-20-angular
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 7, 2025
Dataset authored and provided by
open vector database
Description
Dataset Overview

dataset: random-xs-20-angular

Metadata

Creation Time: 2025-01-07 11:37:47+0000 Update Time: 2025-01-07 11:37:48+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/random-xs-20-angular.
f
Clinical diagnostic efficacy comparison of the proposed hybrid model and...
figshare.com
xls
Updated Oct 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Medhat A. Tawfeek; Ibrahim Alrashdi; Madallah Alruwaili; Hisham Allahem (2025). Clinical diagnostic efficacy comparison of the proposed hybrid model and benchmark machine learning models. Metrics are reported as point estimates with 95% CIs. This table offers a discussion of the measures of high interest in clinical implementation. Sensitivity (true positive rate) reflects the ability to correctly identify patients with CVD, while Specificity (true negative rate) indicates the ability to correctly rule out patients without CVD. The Negative Likelihood Ratio quantifies how much the odds of having the disease decrease with a negative test result, where smaller values indicate stronger diagnostic power. The suggested model has the best balance of the greatest sensitivity and specificity, as well as the least Negative Likelihood Ratio, which highlights its strength and better performance as a diagnostic tool in the clinical practice. [Dataset]. http://doi.org/10.1371/journal.pone.0335421.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0335421.t005
Dataset updated
Oct 29, 2025
Dataset provided by
PLOS ONE
Authors
Medhat A. Tawfeek; Ibrahim Alrashdi; Madallah Alruwaili; Hisham Allahem
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Clinical diagnostic efficacy comparison of the proposed hybrid model and benchmark machine learning models. Metrics are reported as point estimates with 95% CIs. This table offers a discussion of the measures of high interest in clinical implementation. Sensitivity (true positive rate) reflects the ability to correctly identify patients with CVD, while Specificity (true negative rate) indicates the ability to correctly rule out patients without CVD. The Negative Likelihood Ratio quantifies how much the odds of having the disease decrease with a negative test result, where smaller values indicate stronger diagnostic power. The suggested model has the best balance of the greatest sensitivity and specificity, as well as the least Negative Likelihood Ratio, which highlights its strength and better performance as a diagnostic tool in the clinical practice.
h
lastfm-64-dot
huggingface.co
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
open vector database (2025). lastfm-64-dot [Dataset]. https://huggingface.co/datasets/open-vdb/lastfm-64-dot
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 7, 2025
Dataset authored and provided by
open vector database
Description
Dataset Overview

dataset: lastfm-64-dot

Metadata

Creation Time: 2025-01-06 11:09:48+0000 Update Time: 2025-01-07 11:48:10+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of the… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/lastfm-64-dot.
h
fashion-mnist-784-euclidean
huggingface.co
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
open vector database (2025). fashion-mnist-784-euclidean [Dataset]. https://huggingface.co/datasets/open-vdb/fashion-mnist-784-euclidean
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 7, 2025
Dataset authored and provided by
open vector database
Description
Dataset Overview

dataset: fashion-mnist-784-euclidean

Metadata

Creation Time: 2025-01-07 11:02:55+0000 Update Time: 2025-01-07 11:03:01+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/fashion-mnist-784-euclidean.
h
nytimes-16-angular
huggingface.co
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
open vector database (2025). nytimes-16-angular [Dataset]. https://huggingface.co/datasets/open-vdb/nytimes-16-angular
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 7, 2025
Dataset authored and provided by
open vector database
Description
Dataset Overview

dataset: nytimes-16-angular

Metadata

Creation Time: 2025-01-07 11:46:28+0000 Update Time: 2025-01-07 11:46:31+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/nytimes-16-angular.
h
glove-25-angular
huggingface.co
Updated Jan 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
open vector database (2025). glove-25-angular [Dataset]. https://huggingface.co/datasets/open-vdb/glove-25-angular
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 7, 2025
Dataset authored and provided by
open vector database
Description
Dataset Overview

dataset: glove-25-angular

Metadata

Creation Time: 2025-01-07 11:05:52+0000 Update Time: 2025-01-07 11:06:02+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of the… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/glove-25-angular.
gas-centroids
huggingface.co
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CryptoLab Inc. (2025). gas-centroids [Dataset]. https://huggingface.co/datasets/cryptolab-playground/gas-centroids
Explore at:
Dataset updated
Nov 21, 2025
Dataset provided by
CRYPTOLAB INC.
Authors
CryptoLab Inc.
Description
GAS Indexing Artifacts

Dataset Description

This dataset contains pre-computed deterministic centroids and associated geometric metadata generated using our GAS (Geometry-Aware Selection) algorithm. These artifacts are designed to benchmark Approximate Nearest Neighbor (ANN) search performance in privacy-preserving or dynamic vector database environments.

Purpose

To serve as a standardized benchmark resource for evaluating the efficiency and recall of vector… See the full description on the dataset page: https://huggingface.co/datasets/cryptolab-playground/gas-centroids.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

open vector database (2025). gist-960-euclidean [Dataset]. https://huggingface.co/datasets/open-vdb/gist-960-euclidean

gist-960-euclidean

open-vdb/gist-960-euclidean

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 7, 2025

Dataset authored and provided by

open vector database

Description

Dataset Overview

dataset: gist-960-euclidean

  Metadata

Creation Time: 2025-01-07 11:03:48+0000 Update Time: 2025-01-07 11:04:44+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/gist-960-euclidean.

Clear search

Close search

Google apps

Main menu

gist-960-euclidean

glove-100-angular

pubmed-arxiv-abstract-embedding-gemma-300m

Vector cartographic database | gimi9.com

nytimes-256-angular

Bloomberg-Financial-News-embedding-gemma-300m

random-xs-20-angular

Clinical diagnostic efficacy comparison of the proposed hybrid model and...

lastfm-64-dot

fashion-mnist-784-euclidean

nytimes-16-angular

glove-25-angular

gas-centroids

gist-960-euclidean

open-vdb/gist-960-euclidean