100+ datasets found

NIST Computational Chemistry Comparison and Benchmark Database - SRD 101
catalog.data.gov
data.amerigeoss.org
+1more
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2025). NIST Computational Chemistry Comparison and Benchmark Database - SRD 101 [Dataset]. https://catalog.data.gov/dataset/nist-computational-chemistry-comparison-and-benchmark-database-srd-101-e19c1
Explore at:
Dataset updated
Jul 9, 2025
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The NIST Computational Chemistry Comparison and Benchmark Database is a collection of experimental and ab initio thermochemical properties for a selected set of gas-phase molecules. The goals are to provide a benchmark set of experimental data for the evaluation of ab initio computational methods and allow the comparison between different ab initio computational methods for the prediction of gas-phase thermochemical properties. The data files linked to this record are a subset of the experimental data present in the CCCBDB.
f
Performance comparison on the benchmark noisy database.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthieu Doyen; Di Ge; Alain Beuchée; Guy Carrault; Alfredo I. Hernández (2023). Performance comparison on the benchmark noisy database. [Dataset]. http://doi.org/10.1371/journal.pone.0223785.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0223785.t003
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Matthieu Doyen; Di Ge; Alain Beuchée; Guy Carrault; Alfredo I. Hernández
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance comparison on the benchmark noisy database.
w
IBNET Benchmarking Database
wbwaterdata.org
Updated Mar 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). IBNET Benchmarking Database [Dataset]. https://wbwaterdata.org/dataset/ibnet-benchmarking-database
Explore at:
Dataset updated
Mar 18, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data on water utilities for 151 national jurisdictions, for a range of years up to and including 2017 (year range varies greatly by country and utility) on service and utility parameters (Benchmark Database) and Tariffs for 211 juristictions (Tariffs database). Information includes cost recovery, connections, population served, financial performance, non-revenue water, residential and total supply, total production. Data can be called up by utility, by group of utility, and by comparison between utilities, including the whole (global) utility database, enabling both country and global level comparison for individual utilities. Data can be downloaded in xls format.
PatchMAN BSA filtering databases and benchmark input and natives
zenodo.org
application/gzip, zip
Updated Jul 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ora Schueler-Furman; Ora Schueler-Furman; Alisa Khramushin; Alisa Khramushin; Julia Kornélia Varga; Julia Kornélia Varga (2024). PatchMAN BSA filtering databases and benchmark input and natives [Dataset]. http://doi.org/10.5281/zenodo.13118411
Explore at:
application/gzip, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13118411
Dataset updated
Jul 31, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ora Schueler-Furman; Ora Schueler-Furman; Alisa Khramushin; Alisa Khramushin; Julia Kornélia Varga; Julia Kornélia Varga
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the list of unbound receptors, peptides and natives that was used for PatchMAN BSA filtering paper.

It also containts the databases that are used 1) search with MASTER, 2) extraction of fragments with MASTER.
LDBC-SNB SF-0001 and SF-0003 Datasets
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Jan 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arnau Prat-Pérez; Arnau Prat-Pérez (2020). LDBC-SNB SF-0001 and SF-0003 Datasets [Dataset]. http://doi.org/10.5281/zenodo.3452106
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3452106
Dataset updated
Jan 21, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Arnau Prat-Pérez; Arnau Prat-Pérez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This datasets generated with the LDBC SNB Data generator.

https://github.com/ldbc/ldbc_snb_datagen

It corresponds to Scale Factors 1 and 3. They are used in the following paper:

An early look at the LDBC social network benchmark's business intelligence workload

10.1145/3210259.3210268
f
Comparisons of CPU time on AR face database testing with scarves.
figshare.com
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guangwei Gao; Jian Yang; Xiaoyuan Jing; Pu Huang; Juliang Hua; Dong Yue (2023). Comparisons of CPU time on AR face database testing with scarves. [Dataset]. http://doi.org/10.1371/journal.pone.0159945.t008
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0159945.t008
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Guangwei Gao; Jian Yang; Xiaoyuan Jing; Pu Huang; Juliang Hua; Dong Yue
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparisons of CPU time on AR face database testing with scarves.
Z
Data from: Crowdsourced WiFi database and benchmark software for indoor...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Jan 29, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Torres-Sospedra, Joaquin (2021). Crowdsourced WiFi database and benchmark software for indoor positioning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_889797
Explore at:
Dataset updated
Jan 29, 2021
Dataset provided by
Huerta, Joaquin
Cramariuc, Andrei
Leppäkoski, Helena
Lohan, Elena Simona
Richter, Philipp
Torres-Sospedra, Joaquin
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains two Wi-Fi databases (one for training and one for test/estimation purposes in indoor positioning applications), collected in a crowdsourced mode (i.e., via 21 different devices and different users), together with a benchmarking utility software (in Matlab and Python) to illustrate various algorithms of indoor positioning based solely on WiFi information (MAC addresses and RSS values).

The data was collected in a 4-floor university building in Tampere, Finland, during Jan-Aug 2017 and it comprises 687 training fingerprints and 3951 test or estimation fingerprints.

13.10.2017: Version 2 uploaded; the revised version contains improved readme files and improved Python SW.

The dataset and/or the associated software are to be cited as follows:

E.S. Lohan, J. Torres-Sospedra, P. Richter, H. Leppäkoski, J. Huerta, A. Cramariuc, “Crowdsourced WiFi-fingerprinting database and benchmark software for indoor positioning”, Zenodo repository, DOI 10.5281/zenodo.889798
i
Component database for topology benchmarking with guidelines of how to use...
ieee-dataport.org
Updated Oct 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhengge Chen (2020). Component database for topology benchmarking with guidelines of how to use it [Dataset]. https://ieee-dataport.org/documents/component-database-topology-benchmarking-guidelines-how-use-it
Explore at:
Dataset updated
Oct 1, 2020
Authors
Zhengge Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
you can see our case study results by using this built database to select components and benchmark the bridgeless buck-boost PFC converters.
agentic-data-access-benchmark
huggingface.co
Updated Nov 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hasura (2024). agentic-data-access-benchmark [Dataset]. https://huggingface.co/datasets/hasura/agentic-data-access-benchmark
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 1, 2024
Dataset provided by
Hasura, Inc.
Authors
Hasura
Description
Agentic Data Access Benchmark (ADAB)

Agentic Data Access Benchmark is a set of real-world questions over few "closed domains" to illustrate the evaluation of closed domain AI assistants/agents. Closed domains are domains where data is not available implicitly in the LLM as they reside in secure or private systems e.g. enterprise databases, SaaS applications, etc and AI solutions require mechanisms to connect an LLM to such data. If you are evaluating an AI product or building your… See the full description on the dataset page: https://huggingface.co/datasets/hasura/agentic-data-access-benchmark.
b
Benchmark Energy & Geometry Database
bioregistry.io
registry.identifiers.org
Updated Apr 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Benchmark Energy & Geometry Database [Dataset]. http://identifiers.org/re3data:r3d100011166
Explore at:
Unique identifier
https://identifiers.org/re3data:r3d100011166
Dataset updated
Apr 25, 2021
Description
The Benchmark Energy & Geometry Database (BEGDB) collects results of highly accurate quantum mechanics (QM) calculations of molecular structures, energies and properties. These data can serve as benchmarks for testing and parameterization of other computational methods.
Additive Manufacturing Benchmark 2022 Schema
catalog.data.gov
datasets.ai
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2025). Additive Manufacturing Benchmark 2022 Schema [Dataset]. https://catalog.data.gov/dataset/additive-manufacturing-benchmark-2022-schema-41490
Explore at:
Dataset updated
Mar 14, 2025
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This resource is the implementation in XML Schema [1] of a data model that describes the Additive Manufacturing Benchmark 2022 series data. It provides a robust set of metadata for the build processes and their resulting specimens and for measurements made on these in the context of the AM Bench 2022 project.The schema was designed to support typical science questions which users of a database with metadata about the AM Bench results might wish to pose. The metadata include identifiers assigned to build products, derived specimens, and measurements; links to relevant journal publications, documents, and illustrations; provenance of specimens such as source materials and details of the build process; measurement geometry, instruments and other configurations used in measurements; and access information to raw and processed data as well as analysis descriptions of these datasets.This data model is an abstraction of these metadata, designed using the concepts of inheritance, normalization, and reusability of an object oriented language for ease of extensibility and maintenance. It is simple to incorporate new metadata as needed.A CDCS [2] database at NIST was filled with metadata provided by the contributors to the AM Bench project. They entered values for the metadata fields for an AM Bench measurement, specimen or build process in tabular spreadsheets. These entries were translated to XML documents compliant with the schema using a set of python scripts. The generated XML documents were loaded into the database with a persistent identifier (PID) assigned by the database.[1] https://www.w3.org/XML/Schema[2] https://www.nist.gov/itl/ssd/information-systems-group/configurable-data-curation-system-cdcs/about-cdcs
Annotated Benchmark of Real-World Data for Approximate Functional Dependency...
zenodo.org
data.niaid.nih.gov
csv
Updated Jul 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcel Parciak; Marcel Parciak; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren (2023). Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery [Dataset]. http://doi.org/10.5281/zenodo.8098909
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8098909
Dataset updated
Jul 1, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marcel Parciak; Marcel Parciak; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery

This collection consists of ten open access relations commonly used by the data management community. In addition to the relations themselves (please take note of the references to the original sources below), we added three lists in this collection that describe approximate functional dependencies found in the relations. These lists are the result of a manual annotation process performed by two independent individuals by consulting the respective schemas of the relations and identifying column combinations where one column implies another based on its semantics. As an example, in the claims.csv file, the AirportCode implies AirportName, as each code should be unique for a given airport.

The file ground_truth.csv is a comma separated file containing approximate functional dependencies. table describes the relation we refer to, lhs and rhs reference two columns of those relations where semantically we found that lhs implies rhs.

The file excluded_candidates.csv and included_candidates.csv list all column combinations that were excluded or included in the manual annotation, respectively. We excluded a candidate if there was no tuple where both attributes had a value or if the g3_prime value was too small.

Dataset References

adult.csv: Dua, D. and Graff, C. (2019). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.

claims.csv: TSA Claims Data 2002 to 2006, published by the U.S. Department of Homeland Security.

dblp10k.csv: Frequency-aware Similarity Measures. Lange, Dustin; Naumann, Felix (2011). 243–248. Made available as DBLP Dataset 2.

hospital.csv: Hospital dataset used in Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow. 13, 12 (August 2020), 2270–2283. https://doi.org/10.14778/3407790.3407824. Made available as part the dataset collection to that paper.

t_biocase_... files: t_bioc_... files used in Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow. 13, 12 (August 2020), 2270–2283. https://doi.org/10.14778/3407790.3407824. Made available as part the dataset collection to that paper.

tax.csv: Tax dataset used in Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow. 13, 12 (August 2020), 2270–2283. https://doi.org/10.14778/3407790.3407824. Made available as part the dataset collection to that paper.
NIST NLTE-4 Plasma Population Kinetics Database
datasets.ai
catalog.data.gov
+1more
21
Updated Aug 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2024). NIST NLTE-4 Plasma Population Kinetics Database [Dataset]. https://datasets.ai/datasets/nist-nlte-4-plasma-population-kinetics-database-99df6
Explore at:
21Available download formats
Dataset updated
Aug 6, 2024
Dataset authored and provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This database contains benchmark results for simulation of plasma population kinetics and emission spectra. The data were contributed by the participants of the 4th Non-LTE Code Comparison Workshop who have unrestricted access to the database. The only limitation for other users is in hidden labeling of the output results. Guest users can proceed to the database entry page without entering userid and password.
o
European Business Performance Database
openicpsr.org
Updated Sep 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Youssef Cassis; Harm Schroeter; Andrea Colli (2018). European Business Performance Database [Dataset]. http://doi.org/10.3886/E106060V2
Explore at:
Unique identifier
https://doi.org/10.3886/E106060V2
Dataset updated
Sep 15, 2018
Dataset provided by
EUI, Florence
Bocconi University
Bergen University
Authors
Youssef Cassis; Harm Schroeter; Andrea Colli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The European Business Performance database describes the performance of the largest enterprises in the twentieth century. It covers eight countries that together consistently account for above 80 per cent of western European GDP: Great Britain, Germany, France, Belgium, Italy, Spain, Sweden, and Finland. Data have been collected for five benchmark years, namely on the eve of WWI (1913), before the Great Depression (1927), at the extremes of the golden age (1954 and 1972), and in 2000.The database is comprised of two distinct datasets. The Small Sample (625 firms) includes the largest enterprises in each country across all industries (economy-wide). To avoid over-representation of certain countries and sectors, countries contribute a number of firms that is roughly proportionate to the size of the economy: 30 firms from Great Britain, 25 from Germany, 20 from France, 15 from Italy, 10 from Belgium, Spain, and Sweden, and 5 from Finland. By the same token, a cap has been set on the number of financial firms entering the sample, so that they range between up to 6 for Britain and 1 for Finland.The second dataset, or Large Sample (1,167 firms), is made up of the largest firms per industry. Here industries are so selected as to take into account long-term technological developments and the rise of entirely new products and services. Firms have been individually classified using the two-digit ISIC Rev. 3.1 codes, then grouped under a manageable number of industries. To some extent and broadly speaking, the two samples have a rather distinct focus: the Small Sample is biased in favour of sheer bigness, whereas the Large Sample emphasizes industries.As far as size and performance indicators are concerned, total assets has been picked as the main size measure in the first three benchmarks, turnover in 1972 and 2000 (financial intermediaries, though, are ranked by total assets throughout the database). Performance is gauged by means of two financial ratios, namely return on equity and shareholders’ return, i.e. the percentage year-on-year change in share price based on year-end values. In order to smooth out volatility, at each benchmark performance figures have been averaged over three consecutive years (for instance, performance in 1913 reflects average performance in 1911, 1912, and 1913).All figures were collected in national currency and converted to US dollars at current year-average exchange rates.
c
Elevation Benchmarks
s.cnmilf.com
data.cityofchicago.org
+3more
Updated Dec 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofchicago.org (2023). Elevation Benchmarks [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/elevation-benchmarks
Explore at:
Dataset updated
Dec 2, 2023
Dataset provided by
data.cityofchicago.org
Description
The following dataset includes "Active Benchmarks," which are provided to facilitate the identification of City-managed standard benchmarks. Standard benchmarks are for public and private use in establishing a point in space. Note: The benchmarks are referenced to the Chicago City Datum = 0.00, (CCD = 579.88 feet above mean tide New York). The City of Chicago Department of Water Management’s (DWM) Topographic Benchmark is the source of the benchmark information contained in this online database. The information contained in the index card system was compiled by scanning the original cards, then transcribing some of this information to prepare a table and map. Over time, the DWM will contract services to field verify the data and update the index card system and this online database.This dataset was last updated September 2011. Coordinates are estimated. To view map, go to https://data.cityofchicago.org/Buildings/Elevation-Benchmarks-Map/kmt9-pg57 or for PDF map, go to http://cityofchicago.org/content/dam/city/depts/water/supp_info/Benchmarks/BMMap.pdf. Please read the Terms of Use: http://www.cityofchicago.org/city/en/narr/foia/data_disclaimer.html.
d
Benchmarks
datasets.ai
s.cnmilf.com
+2more
0, 17, 21, 23, 25, 38 +6
Updated Sep 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Earth Data Analysis Center, University of New Mexico (2024). Benchmarks [Dataset]. https://datasets.ai/datasets/benchmarks
Explore at:
52, 38, 55, 53, 21, 23, 57, 25, 51, 0, 17, 8Available download formats
Dataset updated
Sep 11, 2024
Dataset authored and provided by
Earth Data Analysis Center, University of New Mexico
Description
The National Flood Hazard Layer (NFHL) data incorporates all Digital Flood Insurance Rate Map(DFIRM) databases published by FEMA, and any Letters Of Map Revision (LOMRs) that have been issued against those databases since their publication date. The DFIRM Database is the digital, geospatial version of the flood hazard information shown on the published paper Flood Insurance Rate Maps(FIRMs). The primary risk classifications used are the 1-percent-annual-chance flood event, the 0.2-percent-annual-chance flood event, and areas of minimal flood risk. The NFHL data are derived from Flood Insurance Studies (FISs), previously published Flood Insurance Rate Maps (FIRMs), flood hazard analyses performed in support of the FISs and FIRMs, and new mapping data where available. The FISs and FIRMs are published by the Federal Emergency Management Agency (FEMA). The specifications for the horizontal control of DFIRM data are consistent with those required for mapping at a scale of 1:12,000. The NFHL data contain layers in the Standard DFIRM datasets except for S_Label_Pt and S_Label_Ld. The NFHL is available as State or US Territory data sets. Each State or Territory data set consists of all DFIRMs and corresponding LOMRs available on the publication date of the data set.
Database for climate time series homogenization with metadata
zenodo.org
zip
Updated Aug 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Domonkos; Peter Domonkos (2022). Database for climate time series homogenization with metadata [Dataset]. http://doi.org/10.5281/zenodo.6990845
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6990845
Dataset updated
Aug 15, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Peter Domonkos; Peter Domonkos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Usefulness of metadata in the automatic version of ACMANTv5 was tested.
A benchmark database has been developed, which consists of 41 datasets
of 20,500 networks of 170,000 synthetic monthly temperature time series
and the relating metadata dates. The research was supported by the
Catalan Meteorological Service. The research results will be published
in the open access MDPI journal Atmosphere.

See more in the "Readme.txt" file of the dataset.
a
Benchmark
gis-cupertino.opendata.arcgis.com
Updated Jan 21, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Cupertino (2016). Benchmark [Dataset]. https://gis-cupertino.opendata.arcgis.com/datasets/benchmark/api
Explore at:
Dataset updated
Jan 21, 2016
Dataset authored and provided by
City of Cupertino
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered

Description
Benchmark is a Point FeatureClass representing land-surveyed benchmarks in Cupertino. Benchmarks are stable sites used to provide elevation data. It is primarily used as a reference layer. The layer is updated as needed by the GIS department. Benchmark has the following fields:

OBJECTID: Unique identifier automatically generated by Esri type: OID, length: 4, domain: none

ID: Unique identifier assigned to the Benchmark type: Integer, length: 4, domain: none

REF_MARK: The reference mark associated with the Benchmark type: String, length: 10, domain: none

ELEV: The elevation of the Benchmark type: Double, length: 8, domain: none

Shape: Field that stores geographic coordinates associated with feature type: Geometry, length: 4, domain: none

Description: A more detailed description of the Benchmark type: String, length: 200, domain: none

Owner: The owner of the Benchmark type: String, length: 10, domain: none

GlobalID: Unique identifier automatically generated for features in enterprise database type: GlobalID, length: 38, domain: none Operator:

The user responsible for updating this database type: String, length: 255, domain: OPERATOR

last_edited_date: The date the database row was last updated type: Date, length: 8, domain: none

created_date: The date the database row was initially created type: Date, length: 8, domain: none

VerticalDatum: The vertical datum associated with the Benchmarktype: String, length: 100, domain: none
o
Benchmark Database For Phonetic Alignments
explore.openaire.eu
Updated Jun 5, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johann-Mattis List; Jelena Prokić (2014). Benchmark Database For Phonetic Alignments [Dataset]. http://doi.org/10.5281/zenodo.11880
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.11880
Dataset updated
Jun 5, 2014
Authors
Johann-Mattis List; Jelena Prokić
Description
In the last two decades, alignment analyses have become an important technique in quantitative historical linguistics and dialectology. Phonetic alignment plays a crucial role in the identification of regular sound correspondences and deeper genealogical relations between and within languages and language families. Surprisingly, up to today, there are no easily accessible benchmark data sets for phonetic alignment analyses. Here we present a publicly available database of manually edited phonetic alignments which can serve as a platform for testing and improving the performance of automatic alignment algorithms. The database consists of a great variety of alignments drawn from a large number of different sources. The data is arranged in a such way that typical problems encountered in phonetic alignment analyses (metathesis, diversity of phonetic sequences) are represented and can be directly tested.
NADA-SynShapes: A synthetic shape benchmark for testing probabilistic deep...
zenodo.org
text/x-python, zip
Updated Apr 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giulio Del Corso; Giulio Del Corso; Volpini Federico; Volpini Federico; Claudia Caudai; Claudia Caudai; Davide Moroni; Davide Moroni; Sara Colantonio; Sara Colantonio (2025). NADA-SynShapes: A synthetic shape benchmark for testing probabilistic deep learning models [Dataset]. http://doi.org/10.5281/zenodo.15194187
Explore at:
zip, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15194187
Dataset updated
Apr 16, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Giulio Del Corso; Giulio Del Corso; Volpini Federico; Volpini Federico; Claudia Caudai; Claudia Caudai; Davide Moroni; Davide Moroni; Sara Colantonio; Sara Colantonio
License
Attribution-NonCommercial-NoDerivs 2.5 (CC BY-NC-ND 2.5)https://creativecommons.org/licenses/by-nc-nd/2.5/
License information was derived automatically
Time period covered
Dec 18, 2024
Description
NADA (Not-A-Database) is an easy-to-use geometric shape data generator that allows users to define non-uniform multivariate parameter distributions to test novel methodologies. The full open-source package is provided at GIT:NA_DAtabase. See Technical Report for details on how to use the provided package.

This database includes 3 repositories:

NADA_Dis: Is the model able to correctly characterize/Disentangle a complex latent space?
The repository contains 3x100,000 synthetic black and white images to test the ability of the models to correctly define a proper latent space (e.g., autoencoders) and disentangle it. The first 100,000 images contain 4 shapes and uniform parameter space distributions, while the other images have a more complex underlying distribution (truncated Gaussian and correlated marginal variables).

NADA_OOD: Does the model identify Out-Of-Distribution images?
The repository contains 100,000 training images (4 different shapes with 3 possible colors located in the upper left corner of the canvas) and 6x100,000 increasingly different sets of images (changing the color class balance, reducing the radius of the shape, moving the shape to the lower left corner) providing increasingly challenging out-of-distribution images.
This can help to test not only the capability of a model, but also methods that produce reliability estimates and should correctly classify OOD elements as "unreliable" as they are far from the original distributions.

NADA_AlEp: Does the model distinguish between different types (Aleatoric/Epistemic) of uncertainties?
The repository contains 5x100,000 images with different type of noise/uncertainties:

NADA_AlEp_0_Clean: Dataset clean of noise to use as a possible training set.

NADA_AlEp_1_White_Noise: Epistemic white noise dataset. Each image is perturbed with an amount of white noise randomly sampled from 0% to 90%.

NADA_AlEp_2_Deformation: Dataset with Epistemic deformation noise. Each image is deformed by a randomly amount uniformly sampled between 0% and 90%. 0% corresponds to the original image, while 100% is a full deformation to the circumscribing circle.

NADA_AlEp_3_Label: Dataset with label noise. Formally, 20% of Triangles of a given color are missclassified as a Square with a random color (among Blue, Orange, and Brown) and viceversa (Squares to Triangles). Label noise introduces \textit{Aleatoric Uncertainty} because it is inherent in the data and cannot be reduced.

NADA_AlEp_4_Combined: Combined dataset with all previous sources of uncertainty.

Each image can be used for classification (shape/color) or regression (radius/area) tasks.

All datasets can be modified and adapted to the user's research question using the included open source data generator.

Facebook

Twitter

Click to copy link

Link copied

Cite

National Institute of Standards and Technology (2025). NIST Computational Chemistry Comparison and Benchmark Database - SRD 101 [Dataset]. https://catalog.data.gov/dataset/nist-computational-chemistry-comparison-and-benchmark-database-srd-101-e19c1

NIST Computational Chemistry Comparison and Benchmark Database - SRD 101

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jul 9, 2025

Dataset provided by

National Institute of Standards and Technologyhttp://www.nist.gov/

Description

The NIST Computational Chemistry Comparison and Benchmark Database is a collection of experimental and ab initio thermochemical properties for a selected set of gas-phase molecules. The goals are to provide a benchmark set of experimental data for the evaluation of ab initio computational methods and allow the comparison between different ab initio computational methods for the prediction of gas-phase thermochemical properties. The data files linked to this record are a subset of the experimental data present in the CCCBDB.

Clear search

Close search

Google apps

Main menu

NIST Computational Chemistry Comparison and Benchmark Database - SRD 101

Performance comparison on the benchmark noisy database.

IBNET Benchmarking Database

PatchMAN BSA filtering databases and benchmark input and natives

LDBC-SNB SF-0001 and SF-0003 Datasets

Comparisons of CPU time on AR face database testing with scarves.

Data from: Crowdsourced WiFi database and benchmark software for indoor...

Component database for topology benchmarking with guidelines of how to use...

agentic-data-access-benchmark

Benchmark Energy & Geometry Database

Additive Manufacturing Benchmark 2022 Schema

Annotated Benchmark of Real-World Data for Approximate Functional Dependency...

NIST NLTE-4 Plasma Population Kinetics Database

European Business Performance Database

Elevation Benchmarks

Benchmarks

Database for climate time series homogenization with metadata

Benchmark

Benchmark Database For Phonetic Alignments

NADA-SynShapes: A synthetic shape benchmark for testing probabilistic deep...

NIST Computational Chemistry Comparison and Benchmark Database - SRD 101