13 datasets found
  1. h

    gist-960-euclidean

    • huggingface.co
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    open vector database (2025). gist-960-euclidean [Dataset]. https://huggingface.co/datasets/open-vdb/gist-960-euclidean
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    open vector database
    Description

    Dataset Overview

    dataset: gist-960-euclidean

      Metadata
    

    Creation Time: 2025-01-07 11:03:48+0000 Update Time: 2025-01-07 11:04:44+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

    This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/gist-960-euclidean.

  2. h

    glove-100-angular

    • huggingface.co
    Updated Jan 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    open vector database (2025). glove-100-angular [Dataset]. https://huggingface.co/datasets/open-vdb/glove-100-angular
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    open vector database
    Description

    Dataset Overview

    dataset: glove-100-angular

      Metadata
    

    Creation Time: 2025-01-07 11:21:16+0000 Update Time: 2025-01-07 11:21:31+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

    This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of the… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/glove-100-angular.

  3. pubmed-arxiv-abstract-embedding-gemma-300m

    • huggingface.co
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CryptoLab Inc. (2025). pubmed-arxiv-abstract-embedding-gemma-300m [Dataset]. https://huggingface.co/datasets/cryptolab-playground/pubmed-arxiv-abstract-embedding-gemma-300m
    Explore at:
    Dataset updated
    Nov 29, 2025
    Dataset provided by
    CRYPTOLAB INC.
    Authors
    CryptoLab Inc.
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    PubMed & arXiv Abstract Embeddings for Vector Database Benchmarking

      Dataset Description
    

    This dataset contains pre-computed embeddings of scientific paper abstracts from PubMed and arXiv, designed for evaluating vector database performance. The embeddings are generated using Google's EmbeddingGemma-300M model.

      Purpose
    

    Benchmark dataset for evaluating vector database performance, specifically designed for use with VectorDBBench.

      Dataset Summary
    

    Total… See the full description on the dataset page: https://huggingface.co/datasets/cryptolab-playground/pubmed-arxiv-abstract-embedding-gemma-300m.

  4. g

    Vector cartographic database | gimi9.com

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vector cartographic database | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_5ebddd1b7894ec1982180840/
    Explore at:
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Vector data base of the Cadastre section — topograhie of the Directorate of Focal Affairs (DAF) of Polynesia-Française. The vector database covers the following topics: — benchmarks (REF): geodetic, levelling, stereo-preparation, geodetic and projection systems. — Location (LOC): islands, administrative boundaries — Edification (EDI): buildings, surface constructions, linear constructions. — Hydrography (HYD): rivers, hydrographic surfaces, lagoons. — Land use (SOL): natural or exploited vegetated areas. — Orography-Relief (REL): level curves, slopes, side points. — Road network (VOI): roads and edges, road furniture. — Toponymy (NOM): points of interest, oronyms, hydronyms, places mentioned

  5. h

    nytimes-256-angular

    • huggingface.co
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    open vector database (2025). nytimes-256-angular [Dataset]. https://huggingface.co/datasets/open-vdb/nytimes-256-angular
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    open vector database
    Description

    Dataset Overview

    dataset: nytimes-256-angular

      Metadata
    

    Creation Time: 2025-01-07 11:44:42+0000 Update Time: 2025-01-07 11:44:49+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

    This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/nytimes-256-angular.

  6. Bloomberg-Financial-News-embedding-gemma-300m

    • huggingface.co
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CryptoLab Inc. (2025). Bloomberg-Financial-News-embedding-gemma-300m [Dataset]. https://huggingface.co/datasets/cryptolab-playground/Bloomberg-Financial-News-embedding-gemma-300m
    Explore at:
    Dataset updated
    Nov 29, 2025
    Dataset provided by
    CRYPTOLAB INC.
    Authors
    CryptoLab Inc.
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Bloomberg Financial News Embeddings for Vector Database Benchmarking

      Dataset Description
    

    This dataset contains pre-computed embeddings of Bloomberg financial news articles, designed for evaluating vector database performance. The embeddings are generated using Google's EmbeddingGemma-300M model.

      Purpose
    

    Benchmark dataset for evaluating vector database performance on financial news domain, specifically designed for use with VectorDBBench.

      Dataset Summary… See the full description on the dataset page: https://huggingface.co/datasets/cryptolab-playground/Bloomberg-Financial-News-embedding-gemma-300m.
    
  7. h

    random-xs-20-angular

    • huggingface.co
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    open vector database (2025). random-xs-20-angular [Dataset]. https://huggingface.co/datasets/open-vdb/random-xs-20-angular
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    open vector database
    Description

    Dataset Overview

    dataset: random-xs-20-angular

      Metadata
    

    Creation Time: 2025-01-07 11:37:47+0000 Update Time: 2025-01-07 11:37:48+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

    This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/random-xs-20-angular.

  8. f

    Clinical diagnostic efficacy comparison of the proposed hybrid model and...

    • figshare.com
    xls
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Medhat A. Tawfeek; Ibrahim Alrashdi; Madallah Alruwaili; Hisham Allahem (2025). Clinical diagnostic efficacy comparison of the proposed hybrid model and benchmark machine learning models. Metrics are reported as point estimates with 95% CIs. This table offers a discussion of the measures of high interest in clinical implementation. Sensitivity (true positive rate) reflects the ability to correctly identify patients with CVD, while Specificity (true negative rate) indicates the ability to correctly rule out patients without CVD. The Negative Likelihood Ratio quantifies how much the odds of having the disease decrease with a negative test result, where smaller values indicate stronger diagnostic power. The suggested model has the best balance of the greatest sensitivity and specificity, as well as the least Negative Likelihood Ratio, which highlights its strength and better performance as a diagnostic tool in the clinical practice. [Dataset]. http://doi.org/10.1371/journal.pone.0335421.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Medhat A. Tawfeek; Ibrahim Alrashdi; Madallah Alruwaili; Hisham Allahem
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Clinical diagnostic efficacy comparison of the proposed hybrid model and benchmark machine learning models. Metrics are reported as point estimates with 95% CIs. This table offers a discussion of the measures of high interest in clinical implementation. Sensitivity (true positive rate) reflects the ability to correctly identify patients with CVD, while Specificity (true negative rate) indicates the ability to correctly rule out patients without CVD. The Negative Likelihood Ratio quantifies how much the odds of having the disease decrease with a negative test result, where smaller values indicate stronger diagnostic power. The suggested model has the best balance of the greatest sensitivity and specificity, as well as the least Negative Likelihood Ratio, which highlights its strength and better performance as a diagnostic tool in the clinical practice.

  9. h

    lastfm-64-dot

    • huggingface.co
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    open vector database (2025). lastfm-64-dot [Dataset]. https://huggingface.co/datasets/open-vdb/lastfm-64-dot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    open vector database
    Description

    Dataset Overview

    dataset: lastfm-64-dot

      Metadata
    

    Creation Time: 2025-01-06 11:09:48+0000 Update Time: 2025-01-07 11:48:10+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

    This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of the… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/lastfm-64-dot.

  10. h

    fashion-mnist-784-euclidean

    • huggingface.co
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    open vector database (2025). fashion-mnist-784-euclidean [Dataset]. https://huggingface.co/datasets/open-vdb/fashion-mnist-784-euclidean
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    open vector database
    Description

    Dataset Overview

    dataset: fashion-mnist-784-euclidean

      Metadata
    

    Creation Time: 2025-01-07 11:02:55+0000 Update Time: 2025-01-07 11:03:01+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

    This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/fashion-mnist-784-euclidean.

  11. h

    nytimes-16-angular

    • huggingface.co
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    open vector database (2025). nytimes-16-angular [Dataset]. https://huggingface.co/datasets/open-vdb/nytimes-16-angular
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    open vector database
    Description

    Dataset Overview

    dataset: nytimes-16-angular

      Metadata
    

    Creation Time: 2025-01-07 11:46:28+0000 Update Time: 2025-01-07 11:46:31+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

    This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/nytimes-16-angular.

  12. h

    glove-25-angular

    • huggingface.co
    Updated Jan 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    open vector database (2025). glove-25-angular [Dataset]. https://huggingface.co/datasets/open-vdb/glove-25-angular
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    open vector database
    Description

    Dataset Overview

    dataset: glove-25-angular

      Metadata
    

    Creation Time: 2025-01-07 11:05:52+0000 Update Time: 2025-01-07 11:06:02+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

    This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of the… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/glove-25-angular.

  13. gas-centroids

    • huggingface.co
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CryptoLab Inc. (2025). gas-centroids [Dataset]. https://huggingface.co/datasets/cryptolab-playground/gas-centroids
    Explore at:
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    CRYPTOLAB INC.
    Authors
    CryptoLab Inc.
    Description

    GAS Indexing Artifacts

      Dataset Description
    

    This dataset contains pre-computed deterministic centroids and associated geometric metadata generated using our GAS (Geometry-Aware Selection) algorithm. These artifacts are designed to benchmark Approximate Nearest Neighbor (ANN) search performance in privacy-preserving or dynamic vector database environments.

      Purpose
    

    To serve as a standardized benchmark resource for evaluating the efficiency and recall of vector… See the full description on the dataset page: https://huggingface.co/datasets/cryptolab-playground/gas-centroids.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
open vector database (2025). gist-960-euclidean [Dataset]. https://huggingface.co/datasets/open-vdb/gist-960-euclidean

gist-960-euclidean

open-vdb/gist-960-euclidean

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 7, 2025
Dataset authored and provided by
open vector database
Description

Dataset Overview

dataset: gist-960-euclidean

  Metadata

Creation Time: 2025-01-07 11:03:48+0000 Update Time: 2025-01-07 11:04:44+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:

This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses of… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/gist-960-euclidean.

Search
Clear search
Close search
Google apps
Main menu