100+ datasets found
  1. h

    finesse-benchmark-database

    • huggingface.co
    Updated Oct 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    winter.sci.dev (2025). finesse-benchmark-database [Dataset]. https://huggingface.co/datasets/enzoescipy/finesse-benchmark-database
    Explore at:
    Dataset updated
    Oct 25, 2025
    Authors
    winter.sci.dev
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Finesse Benchmark Database

      Overview
    

    finesse-benchmark-database is a data generation factory for atomic probes in the Finesse benchmark. It generates probes_atomic.jsonl files from Wikimedia Wikipedia datasets, leveraging Hugging Face's datasets library, tokenizers from transformers, and optional PyTorch support. This tool is designed to create high-quality, language-specific probe datasets for benchmarking fine-grained understanding in NLP tasks.… See the full description on the dataset page: https://huggingface.co/datasets/enzoescipy/finesse-benchmark-database.

  2. d

    Elevation Benchmarks

    • catalog.data.gov
    • data.cityofchicago.org
    • +3more
    Updated Dec 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofchicago.org (2023). Elevation Benchmarks [Dataset]. https://catalog.data.gov/dataset/elevation-benchmarks
    Explore at:
    Dataset updated
    Dec 2, 2023
    Dataset provided by
    data.cityofchicago.org
    Description

    The following dataset includes "Active Benchmarks," which are provided to facilitate the identification of City-managed standard benchmarks. Standard benchmarks are for public and private use in establishing a point in space. Note: The benchmarks are referenced to the Chicago City Datum = 0.00, (CCD = 579.88 feet above mean tide New York). The City of Chicago Department of Water Management’s (DWM) Topographic Benchmark is the source of the benchmark information contained in this online database. The information contained in the index card system was compiled by scanning the original cards, then transcribing some of this information to prepare a table and map. Over time, the DWM will contract services to field verify the data and update the index card system and this online database.This dataset was last updated September 2011. Coordinates are estimated. To view map, go to https://data.cityofchicago.org/Buildings/Elevation-Benchmarks-Map/kmt9-pg57 or for PDF map, go to http://cityofchicago.org/content/dam/city/depts/water/supp_info/Benchmarks/BMMap.pdf. Please read the Terms of Use: http://www.cityofchicago.org/city/en/narr/foia/data_disclaimer.html.

  3. data-product-benchmark

    • huggingface.co
    Updated Oct 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IBM Research (2025). data-product-benchmark [Dataset]. https://huggingface.co/datasets/ibm-research/data-product-benchmark
    Explore at:
    Dataset updated
    Oct 4, 2025
    Dataset provided by
    IBMhttp://ibm.com/
    IBM Research
    Authors
    IBM Research
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Description

    This dataset provides a benchmark for automatic data product creation. The task is framed as follows: given a natural language data product request and a corpus of text and tables, the objective is to identify the relevant tables and text documents that should be included in the resulting data product which would useful to the given data product request. The benchmark brings together three variants: HybridQA, TAT-QA, and ConvFinQA, each consisting of:

    A corpus… See the full description on the dataset page: https://huggingface.co/datasets/ibm-research/data-product-benchmark.

  4. a

    Benchmark

    • gis-cupertino.opendata.arcgis.com
    • hub.arcgis.com
    Updated Jan 21, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Cupertino (2016). Benchmark [Dataset]. https://gis-cupertino.opendata.arcgis.com/datasets/benchmark/data
    Explore at:
    Dataset updated
    Jan 21, 2016
    Dataset authored and provided by
    City of Cupertino
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Description

    Benchmark is a Point FeatureClass representing land-surveyed benchmarks in Cupertino. Benchmarks are stable sites used to provide elevation data. It is primarily used as a reference layer. The layer is updated as needed by the GIS department. Benchmark has the following fields:

    OBJECTID: Unique identifier automatically generated by Esri type: OID, length: 4, domain: none

    ID: Unique identifier assigned to the Benchmark type: Integer, length: 4, domain: none

    REF_MARK: The reference mark associated with the Benchmark type: String, length: 10, domain: none

    ELEV: The elevation of the Benchmark type: Double, length: 8, domain: none

    Shape: Field that stores geographic coordinates associated with feature type: Geometry, length: 4, domain: none

    Description: A more detailed description of the Benchmark type: String, length: 200, domain: none

    Owner: The owner of the Benchmark type: String, length: 10, domain: none

    GlobalID: Unique identifier automatically generated for features in enterprise database type: GlobalID, length: 38, domain: none Operator:

    The user responsible for updating this database type: String, length: 255, domain: OPERATOR

    last_edited_date: The date the database row was last updated type: Date, length: 8, domain: none

    created_date: The date the database row was initially created type: Date, length: 8, domain: none

    VerticalDatum: The vertical datum associated with the Benchmarktype: String, length: 100, domain: none

  5. Data from: LakeBench: Benchmarks for Data Discovery over Data Lakes

    • zenodo.org
    bz2
    Updated Oct 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2023). LakeBench: Benchmarks for Data Discovery over Data Lakes [Dataset]. http://doi.org/10.5281/zenodo.10042019
    Explore at:
    bz2Available download formats
    Dataset updated
    Oct 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LakeBench: Benchmarks for Data Discovery over Data Lakes

    Version 3 adds the wiki-join-search benchmark used in the "join search" experiments in our paper.

    The data in the labels files (i.e., labels.json files or files under the labels folder) are shared under Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license: https://creativecommons.org/licenses/by-sa/4.0/

    The data in the tables folder comes from different sources under various open licenses, as detailed in the README.txt file in each folder. All the datasets included in the benchmark have been verified to have a public license that allows distribution, derivatives, and commercial use.

    THIS DATA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

  6. S

    Survey Benchmark

    • data.sanjoseca.gov
    • gisdata-csj.opendata.arcgis.com
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enterprise GIS (2025). Survey Benchmark [Dataset]. https://data.sanjoseca.gov/dataset/survey-benchmark
    Explore at:
    html, zip, geojson, kml, csv, arcgis geoservices rest apiAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    City of San José
    Authors
    Enterprise GIS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These are locations that are to be used as an elevation reference and contain the official elevation and last known latitude and longitude.

    App: The data can be viewed in web map format at: Survey Benchmarks

    Data is published on Mondays on a weekly basis.


  7. d

    Benchmark

    • catalog.data.gov
    • data.brla.gov
    • +1more
    Updated Feb 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.brla.gov (2024). Benchmark [Dataset]. https://catalog.data.gov/dataset/benchmark-3b4b6
    Explore at:
    Dataset updated
    Feb 2, 2024
    Dataset provided by
    data.brla.gov
    Description

    Point geometry with attributes displaying geodetic control stations (benchmarks) in East Baton Rouge Parish, Louisiana.

  8. Benchmark Results: DBpedia 10%

    • figshare.com
    zip
    Updated Apr 28, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Felix Conrads; Jens Lehmann; Muhammad Saleem; Mohamed Morsey; Axel-Cyrille Ngonga Ngomo (2017). Benchmark Results: DBpedia 10% [Dataset]. http://doi.org/10.6084/m9.figshare.3205438.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 28, 2017
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Felix Conrads; Jens Lehmann; Muhammad Saleem; Mohamed Morsey; Axel-Cyrille Ngonga Ngomo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Results of the IGUANA Benchmark in 2015/16 for the truncated DBpedia dataset. The dataset is 10% of the initial 100% dataset.

  9. Blender's Device Benchmarking

    • kaggle.com
    zip
    Updated Apr 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yafeth T.B (2022). Blender's Device Benchmarking [Dataset]. https://www.kaggle.com/datasets/yafethtb/blenders-device-benchmarking
    Explore at:
    zip(30925 bytes)Available download formats
    Dataset updated
    Apr 21, 2022
    Authors
    Yafeth T.B
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    About

    "Blender is the free and open source 3D creation suite. It supports the entirety of the 3D pipeline—modeling, rigging, animation, simulation, rendering, compositing and motion tracking, even video editing and game creation." -- from Blender's about page.

    Blender is community-driven. Its popularity as one of the best freeware in open source community has driving more and more people to use it. But, as like the other 3D creation suite softwares, Blender is quite demanding on hardware requirements. Luckily Blender can be used on almost all kind of common devices sold in the market. It gives the users freedom to choose whichever devices they have. Unfortunately, not all devices are equal. Some are great, others are underperformed.

    Blender gives a way for users to checking how their devices might performed when using their product by benchmarking data. The Blender open data provide the user with benchmarking data which the user might find by querying.

    This dataset was created based on the queries in Blender open data. It provides concise benchmarking data by aggregating the score of all number of benchmarks for each devices.

    The Goals

    I don't have much goals when I made this dataset, but I want to create a database that easy to use to help Blender users to know whether their device is proper to use Blender, especially for version 3.1.0, or they might want to find other devices that suitable.

    Acknowledgement

    All data was taken from Blender Open Data. Cover image from Unsplash.com by @bertsz

  10. Data from: Building Performance Database

    • data.openei.org
    • gimi9.com
    • +1more
    website
    Updated Nov 25, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josh Kace; Travis Walter; Earth Advantage; Josh Kace; Travis Walter; Earth Advantage (2014). Building Performance Database [Dataset]. https://data.openei.org/submissions/145
    Explore at:
    websiteAvailable download formats
    Dataset updated
    Nov 25, 2014
    Dataset provided by
    United States Department of Energyhttp://energy.gov/
    Office of Energy Efficiency and Renewable Energyhttp://energy.gov/eere
    Office of Energy Efficiency & Renewable Energy
    Open Energy Data Initiative (OEDI)
    Authors
    Josh Kace; Travis Walter; Earth Advantage; Josh Kace; Travis Walter; Earth Advantage
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Building Performance Database (BPD) is the largest publicly-available source of measured energy performance data for buildings in the United States. It contains information about the building's energy use, location, and physical and operational characteristics. The BPD can be used by building owners, operators, architects and engineers to compare a building's energy performance against customized peer groups, identify energy performance opportunities, and set energy performance. It can also be used by energy performance program implementers to analyze energy performance features and trends in the building stock. The BPD compiles data from various data sources, converts it into a standard format, cleanses and quality checks the data, and provides users with access to the data in a way that maintains anonymity for data providers.

    The BPD consists of the database itself, a graphical user interface allowing exploration of the data, and an application programming interface allowing the development of third-party applications using the data.

  11. R

    Benchmarking dataset for markerless motion capture analysis

    • entrepot.recherche.data.gouv.fr
    pdf, text/markdown +1
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antoine MULLER; Antoine MULLER; Alexandre NAAIM; Alexandre NAAIM; Raphael DUMAS; Raphael DUMAS; Thomas ROBERT; Thomas ROBERT (2025). Benchmarking dataset for markerless motion capture analysis [Dataset]. http://doi.org/10.57745/LQI2MJ
    Explore at:
    zip(8947731256), zip(4209506126), zip(13927367829), text/markdown(9311), pdf(52635), zip(10181387)Available download formats
    Dataset updated
    Oct 15, 2025
    Dataset provided by
    Recherche Data Gouv
    Authors
    Antoine MULLER; Antoine MULLER; Alexandre NAAIM; Alexandre NAAIM; Raphael DUMAS; Raphael DUMAS; Thomas ROBERT; Thomas ROBERT
    License

    https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.57745/LQI2MJhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.57745/LQI2MJ

    Description

    This dataset contains data from a motion analysis experiment using a markerless and a marker-based system. Its purpose is to benchmark whole-body markerless motion analysis methods. A GitHub repository (https://github.com/lbmc-lyon/Benchmarking_markerless) is associated with this dataset, where one may find and upload kinematics obtained from video data. The data consisted of recordings with both systems of five different tasks performed by two participants. The tasks cover various activities and challenges for both systems (gait, sit-to-stand-to-sit transfers, manual box handling, challenging motions with high feet, and couple dancing). The dataset includes: "marker" data (3D trajectories of 48 skin reflective markers placed on specific anatomical landmarks) provided as .c3d files; RGB video from 9 calibrated and synchronized cameras provided as .avi files; Utility files such as camera calibration files.

  12. f

    Performance comparison on the benchmark noisy database.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    • +1more
    Updated Oct 29, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carrault, Guy; Doyen, Matthieu; Hernández, Alfredo I.; Beuchée, Alain; Ge, Di (2019). Performance comparison on the benchmark noisy database. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000159454
    Explore at:
    Dataset updated
    Oct 29, 2019
    Authors
    Carrault, Guy; Doyen, Matthieu; Hernández, Alfredo I.; Beuchée, Alain; Ge, Di
    Description

    Performance comparison on the benchmark noisy database.

  13. h

    benchmark-dummy-data

    • huggingface.co
    Updated Mar 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evaluation Bot (2023). benchmark-dummy-data [Dataset]. https://huggingface.co/datasets/autoevaluator/benchmark-dummy-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 2, 2023
    Authors
    Evaluation Bot
    Description

    Dummy Dataset for AutoTrain Benchmark

    This dataset contains dummy data that's needed to create AutoTrain projects for benchmarks like RAFT. See here for more details.

  14. f

    Benchmark test databases for IQA.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Sep 23, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lu, Yin; Wu, Xiao-Jun; Sang, Qing-Bing; Li, Chao-Feng (2014). Benchmark test databases for IQA. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001257529
    Explore at:
    Dataset updated
    Sep 23, 2014
    Authors
    Lu, Yin; Wu, Xiao-Jun; Sang, Qing-Bing; Li, Chao-Feng
    Description

    Benchmark test databases for IQA.

  15. u

    TrafPy: Benchmarking Data Centre Network Systems

    • rdr.ucl.ac.uk
    zip
    Updated Jun 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Parsonson (2021). TrafPy: Benchmarking Data Centre Network Systems [Dataset]. http://doi.org/10.5522/04/14815853
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 30, 2021
    Dataset provided by
    University College London
    Authors
    Christopher Parsonson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set contains data related to the paper 'TrafPy: Benchmarking Data Centre Network Systems'. The data have been split into 3 files to avoid needing to download all data sets if only some are needed:1) plotData: The data plotted in the paper for each of the benchmarks averaged across 5 runs.2) trafficData: The flow-centric traffic requests used in each of the simulations.3) simulationData: Each individual benchmark run. Contains full access to the simulation history, metrics, and so on. When unzipped, this file is ~2.5 TB in size.

  16. Z

    Benchmark Database for Phonetic Alignments

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Feb 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    List, Johann-Mattis; Prokić, Jelena (2022). Benchmark Database for Phonetic Alignments [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11880
    Explore at:
    Dataset updated
    Feb 21, 2022
    Dataset provided by
    Philipps-Universität Marburg
    Authors
    List, Johann-Mattis; Prokić, Jelena
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the last two decades, alignment analyses have become an important technique in quantitative historical linguistics and dialectology. Phonetic alignment plays a crucial role in the identification of regular sound correspondences and deeper genealogical relations between and within languages and language families. Surprisingly, up to today, there are no easily accessible benchmark data sets for phonetic alignment analyses. Here we present a publicly available database of manually edited phonetic alignments which can serve as a platform for testing and improving the performance of automatic alignment algorithms. The database consists of a great variety of alignments drawn from a large number of different sources. The data is arranged in a such way that typical problems encountered in phonetic alignment analyses (metathesis, diversity of phonetic sequences) are represented and can be directly tested.

  17. i

    Big Data Machine Learning Benchmark on Spark

    • ieee-dataport.org
    Updated Jun 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jairson Rodrigues (2019). Big Data Machine Learning Benchmark on Spark [Dataset]. https://ieee-dataport.org/open-access/big-data-machine-learning-benchmark-spark
    Explore at:
    Dataset updated
    Jun 6, 2019
    Authors
    Jairson Rodrigues
    Description

    net traffic

  18. i

    Database Performance Monitoring Services Market Report

    • imrmarketreports.com
    Updated Jun 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swati Kalagate; Akshay Patil; Vishal Kumbhar (2024). Database Performance Monitoring Services Market Report [Dataset]. https://www.imrmarketreports.com/reports/database-performance-monitoring-services--market
    Explore at:
    Dataset updated
    Jun 2024
    Dataset provided by
    IMR Market Reports
    Authors
    Swati Kalagate; Akshay Patil; Vishal Kumbhar
    License

    https://www.imrmarketreports.com/privacy-policy/https://www.imrmarketreports.com/privacy-policy/

    Description

    Global Database Performance Monitoring Services comes with the extensive industry analysis of development components, patterns, flows and sizes. The report also calculates present and past market values to forecast potential market management through the forecast period between 2024 - 2032. The report may be the best of what is a geographic area which expands the competitive landscape and industry perspective of the market.

  19. h

    llmsql-benchmark

    • huggingface.co
    • kaggle.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LLMSQL, llmsql-benchmark [Dataset]. https://huggingface.co/datasets/llmsql-bench/llmsql-benchmark
    Explore at:
    Dataset authored and provided by
    LLMSQL
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    LLMSQL Benchmark

    This benchmark is designed to evaluate text-to-SQL models. For usage of this benchmark see https://github.com/LLMSQL/llmsql-benchmark. Arxiv Article: https://arxiv.org/abs/2510.02350

      Files
    

    tables.jsonl — Database table metadata questions.jsonl — All available questions train_questions.jsonl, val_questions.jsonl, test_questions.jsonl — Data splits for finetuning, see https://github.com/LLMSQL/llmsql-benchmark sqlite_tables.db — sqlite db with tables from… See the full description on the dataset page: https://huggingface.co/datasets/llmsql-bench/llmsql-benchmark.

  20. Long-Term Time-Series Forecasting Benchmark

    • kaggle.com
    zip
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konrad Banachewicz (2023). Long-Term Time-Series Forecasting Benchmark [Dataset]. http://identifiers.org/arxiv:2309.1
    Explore at:
    zip(21179653718 bytes)Available download formats
    Dataset updated
    Oct 5, 2023
    Authors
    Konrad Banachewicz
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Konrad Banachewicz

    Released under Database: Open Database, Contents: © Original Authors

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
winter.sci.dev (2025). finesse-benchmark-database [Dataset]. https://huggingface.co/datasets/enzoescipy/finesse-benchmark-database

finesse-benchmark-database

enzoescipy/finesse-benchmark-database

Explore at:
Dataset updated
Oct 25, 2025
Authors
winter.sci.dev
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Finesse Benchmark Database

  Overview

finesse-benchmark-database is a data generation factory for atomic probes in the Finesse benchmark. It generates probes_atomic.jsonl files from Wikimedia Wikipedia datasets, leveraging Hugging Face's datasets library, tokenizers from transformers, and optional PyTorch support. This tool is designed to create high-quality, language-specific probe datasets for benchmarking fine-grained understanding in NLP tasks.… See the full description on the dataset page: https://huggingface.co/datasets/enzoescipy/finesse-benchmark-database.

Search
Clear search
Close search
Google apps
Main menu