The NIST Computational Chemistry Comparison and Benchmark Database is a collection of experimental and ab initio thermochemical properties for a selected set of gas-phase molecules. The goals are to provide a benchmark set of experimental data for the evaluation of ab initio computational methods and allow the comparison between different ab initio computational methods for the prediction of gas-phase thermochemical properties. The data files linked to this record are a subset of the experimental data present in the CCCBDB.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data on water utilities for 151 national jurisdictions, for a range of years up to and including 2017 (year range varies greatly by country and utility) on service and utility parameters (Benchmark Database) and Tariffs for 211 juristictions (Tariffs database). Information includes cost recovery, connections, population served, financial performance, non-revenue water, residential and total supply, total production. Data can be called up by utility, by group of utility, and by comparison between utilities, including the whole (global) utility database, enabling both country and global level comparison for individual utilities. Data can be downloaded in xls format.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Materials informatics is increasingly finding ways to exploit machine learning algorithms. Techniques such as decision trees, ensemble methods, support vector machines, and a variety of neural network architectures are used to predict likely material characteristics and property values. Supplemented with laboratory synthesis, applications of machine learning to compound discovery and characterization represent one of the most promising research directions in materials informatics. A shortcoming of this trend, in its current form, is a lack of standardized materials data sets on which to train, validate, and test model effectiveness. Applied machine learning research depends on benchmark data to make sense of its results. Fixed, predetermined data sets allow for rigorous model assessment and comparison. Machine learning publications that don't refer to benchmarks are often hard to contextualize and reproduce. In this data descriptor article, we present a collection of data sets of different material properties taken from the AFLOW database. We describe them, the procedures that generated them, and their use as potential benchmarks. We provide a compressed ZIP file containing the data sets, and a GitHub repository of associated Python code. Finally, we discuss opportunities for future work incorporating the data sets and creating similar benchmark collections.
Dataset enabling organizations to benchmark their data literacy capability globally.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
In the last two centuries, equations of state (EoSs) have become a key tool for the correlation and prediction of thermodynamic properties of fluids. They not only can be applied to pure substances as well as to mixtures but also constitute the heart of commercially available computer-aided-process-design software. In the last 20 years, thousands of publications have been devoted to the development of sophisticated models or to the improvement of already existing EoSs. Chemical engineering thermodynamics is thus a field under steady development, and to assess the accuracy of a thermodynamic model or to cross-compare two models, it is necessary to confront model predictions with experimental data. In this context, the importance of a reliable free-to-access benchmark database is pivotal and becomes absolutely necessary. The goal of this paper is thus to present a database, specifically designed to assess the accuracy of a thermodynamic model or cross-compare models, to explain how it was developed and to enlighten how to use it. A total of 200 nonelectrolytic binary systems have been selected and divided into nine groups according to the associating character of the components, i.e., their ability to be involved in a hydrogen bond (the nature and strength of the association phenomena are indeed considered a measure of the complexity to model the thermodynamic properties of mixtures). The methodology for assessing the performance of a given model is then described. As an illustration, the Peng–Robinson EoS with classical van der Waals mixing rules and a temperature-dependent binary interaction parameter (kij) have been used to correlate the numerous data included in the proposed database, and its performance has been assessed following the proposed methodology.
The following dataset includes "Active Benchmarks," which are provided to facilitate the identification of City-managed standard benchmarks. Standard benchmarks are for public and private use in establishing a point in space. Note: The benchmarks are referenced to the Chicago City Datum = 0.00, (CCD = 579.88 feet above mean tide New York). The City of Chicago Department of Water Management’s (DWM) Topographic Benchmark is the source of the benchmark information contained in this online database. The information contained in the index card system was compiled by scanning the original cards, then transcribing some of this information to prepare a table and map. Over time, the DWM will contract services to field verify the data and update the index card system and this online database.This dataset was last updated September 2011. Coordinates are estimated. To view map, go to https://data.cityofchicago.org/Buildings/Elevation-Benchmarks-Map/kmt9-pg57 or for PDF map, go to http://cityofchicago.org/content/dam/city/depts/water/supp_info/Benchmarks/BMMap.pdf. Please read the Terms of Use: http://www.cityofchicago.org/city/en/narr/foia/data_disclaimer.html.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contains the benchmark Bayesian network dataset
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance comparison on the benchmark noisy database.
Agentic Data Access Benchmark (ADAB)
Agentic Data Access Benchmark is a set of real-world questions over few "closed domains" to illustrate the evaluation of closed domain AI assistants/agents. Closed domains are domains where data is not available implicitly in the LLM as they reside in secure or private systems e.g. enterprise databases, SaaS applications, etc and AI solutions require mechanisms to connect an LLM to such data. If you are evaluating an AI product or building your… See the full description on the dataset page: https://huggingface.co/datasets/hasura/agentic-data-access-benchmark.
Attribution-NonCommercial-NoDerivs 2.5 (CC BY-NC-ND 2.5)https://creativecommons.org/licenses/by-nc-nd/2.5/
License information was derived automatically
NADA (Not-A-Database) is an easy-to-use geometric shape data generator that allows users to define non-uniform multivariate parameter distributions to test novel methodologies. The full open-source package is provided at GIT:NA_DAtabase. See Technical Report for details on how to use the provided package.
This database includes 3 repositories:
Each image can be used for classification (shape/color) or regression (radius/area) tasks.
All datasets can be modified and adapted to the user's research question using the included open source data generator.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A total of 12 software defect data sets from NASA were used in this study, where five data sets (part I) including CM1, JM1, KC1, KC2, and PC1 are obtained from PROMISE software engineering repository (http://promise.site.uottawa.ca/SERepository/), the other seven data sets (part II) are obtained from tera-PROMISE Repository (http://openscience.us/repo/defect/mccabehalsted/).
This repository contains the underlying data from benchmark experiments for Drifting Acoustic Instrumentation SYstems (DAISYs) in waves and currents described in "Performance of a Drifting Acoustic Instrumentation SYstem (DAISY) for Characterizing Radiated Noise from Marine Energy Converters" (https://link.springer.com/article/10.1007/s40722-024-00358-6). DAISYs consist of a surface expression connected to a hydrophone recording package by a tether. Both elements are instrumented to provide metadata (e.g., position, orientation, and depth). Information about how to build DAISYs is available at https://www.pmec.us/research-projects/daisy. The repository's primary content is three compressed archives (.zip format), each containing multiple MATLAB binary data files (.mat format). A table relating individual data files to figures in the paper, as well as the structure of each file, is included in the repository as a Word document (Data Description MHK-DR.docx). Most of the files contain time series information for a single DAISY deployment (file naming convention: [site]DAISY[Drift #].mat) consisting of processed hydrophone data and associated metadata. For a limited number of DAISY deployments, the hydrophone package was replaced with an acoustic Doppler velocimeter (file naming convention: [site]DAISY[Drift #]_ADV.mat). Data were collected over several years at three locations: (1) Sequim Bay at Pacific Northwest National Laboratory's Marine & Coastal Research Laboratory (MCRL) in Sequim, WA, the energetic tidal channel in Admiralty Inlet, WA (Admiralty Inlet), and the U.S. Navy's Wave Energy Test Site (WETS) in Kaneohe, HI. Brief descriptions of data files at each location follow. MCRL - (1) Drift #4 and #16 contrast the performance of a DAISY and a reference hydrophone (icListen HF Reson), respectively, in the quiescent interior of Sequim Bay (September 2020). (2) Drift #152 and #153 are velocity measurements for a drifting acoustic Doppler velocimeter in in the tidally-energetic entrance channel inside a flow shield and exposed to the flow, respectively (January 2018). (3) Two non-standard files are also included: DAISY_data.mat corresponds to a subset of a DAISY drift over an Adaptable Monitoring Package (AMP) and AMP_data.mat corresponds to approximately co-temporal data for a stationary hydrophone on the AMP (February 2019). Admiralty Inlet - (1) Drift #1-12 correspond to tests with flow shielded DAISYs, unshielded DAISYs, a reference hydrophone, and drifting acoustic Doppler velocimeter with 5, 10, and 15 m tether lengths between surface expression and hydrophone recording package (July 2022). (2) Drift #13-20 correspond to tests of flow shielded DAISYs with three different tether materials (rubber cord, nylon line, and faired nylon line) in lengths of 5, 10, and 15 m (July 2022). WETS - (1) Drift #30-32 correspond to tests with a heave plate incorporated into the tether (standard configuration for wave sites), rubber cord only, and rubber cord, but with a flow shielded hydrophone (November 2022). (2) Drift #49-58 and Drift #65-68 correspond to measurements around mooring infrastructure at the 60 m berth where time-delay-of-arrival localization was demonstrated for different DAISY arrangements and hydrophone depths (November 2022).
Point geometry with attributes displaying geodetic control stations (benchmarks) in East Baton Rouge Parish, Louisiana.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the list of unbound receptors, peptides and natives that was used for PatchMAN BSA filtering paper.
It also containts the databases that are used 1) search with MASTER, 2) extraction of fragments with MASTER.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results of the IGUANA Benchmark in 2015/16 for the truncated DBpedia dataset. The dataset is 50% of the initial 100% dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This datasets generated with the LDBC SNB Data generator.
https://github.com/ldbc/ldbc_snb_datagen
It corresponds to Scale Factors 1 and 3. They are used in the following paper:
An early look at the LDBC social network benchmark's business intelligence workload
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CLASSIC v. 1.0 model inputs and outputs for benchmarking
This dataset is used by scripts in the CLASSIC codebase along with the CLASSIC Singularity software container. Please ensure you obtain them prior to using this dataset. Instructions are provided on the CLASSIC Quick Start Guide.
This dataset contains FLUXNET2015 data that is used to benchmark the Canadian Land Surface Scheme including Biogeochemical Cycles (CLASSIC) v. 1.0. All model inputs required for the (31 for version 1.0) FLUXNET sites are provided along with example outputs that benchmark CLASSIC v. 1.0. The model outputs include raw model outputs, plots of select variables and benchmarking results from the Automated Model Benchmarking (AMBER) package. Following the the CLASSIC Quick Start Guide will generate all outputs on the user's own machine.
This work used eddy covariance data acquired and shared by the FLUXNET community, including these networks: AmeriFlux, AfriFlux, AsiaFlux, CarboAfrica, CarboEuropeIP, CarboItaly, CarboMont, ChinaFlux, Fluxnet-Canada, GreenGrass, ICOS, KoFlux, LBA, NECC, OzFlux-TERN, TCOS-Siberia, and USCCC. The ERA-Interim reanalysis data are provided by ECMWF and processed by LSCE. The FLUXNET eddy covariance data processing and harmonization was carried out by the European Fluxes Database Cluster, AmeriFlux Management Project, and Fluxdata project of FLUXNET, with the support of CDIAC and ICOS Ecosystem Thematic Center, and the OzFlux, ChinaFlux and AsiaFlux offices.
We thank C. Le Quéré for allowing us to distribute her CO2 record that was originally made for the TRENDY project.
The Chicago Building Energy Use Benchmarking Ordinance calls on existing municipal, commercial, and residential buildings larger than 50,000 square feet to track whole-building energy use, report to the City annually, and verify data accuracy every three years. The law, which was phased in from 2014-2017, covers less than 1% of Chicago’s buildings, which account for approximately 20% of total energy used by all buildings. For more details, including ordinance text, rules and regulations, and timing, please visit www.CityofChicago.org/EnergyBenchmarking The ordinance authorizes the City to share property-specific information with the public, beginning with the second year in which a building is required to comply. The dataset represents self-reported and publicly-available property information by calendar year. Please note that the "Data Year" column refers to the year to which the data apply, not the year in which they were reported. That column and filtered views under "Related Content" can be used to isolate specific years.
net traffic
This resource is the implementation in XML Schema [1] of a data model that describes the Additive Manufacturing Benchmark 2022 series data. It provides a robust set of metadata for the build processes and their resulting specimens and for measurements made on these in the context of the AM Bench 2022 project.The schema was designed to support typical science questions which users of a database with metadata about the AM Bench results might wish to pose. The metadata include identifiers assigned to build products, derived specimens, and measurements; links to relevant journal publications, documents, and illustrations; provenance of specimens such as source materials and details of the build process; measurement geometry, instruments and other configurations used in measurements; and access information to raw and processed data as well as analysis descriptions of these datasets.This data model is an abstraction of these metadata, designed using the concepts of inheritance, normalization, and reusability of an object oriented language for ease of extensibility and maintenance. It is simple to incorporate new metadata as needed.A CDCS [2] database at NIST was filled with metadata provided by the contributors to the AM Bench project. They entered values for the metadata fields for an AM Bench measurement, specimen or build process in tabular spreadsheets. These entries were translated to XML documents compliant with the schema using a set of python scripts. The generated XML documents were loaded into the database with a persistent identifier (PID) assigned by the database.[1] https://www.w3.org/XML/Schema[2] https://www.nist.gov/itl/ssd/information-systems-group/configurable-data-curation-system-cdcs/about-cdcs
The NIST Computational Chemistry Comparison and Benchmark Database is a collection of experimental and ab initio thermochemical properties for a selected set of gas-phase molecules. The goals are to provide a benchmark set of experimental data for the evaluation of ab initio computational methods and allow the comparison between different ab initio computational methods for the prediction of gas-phase thermochemical properties. The data files linked to this record are a subset of the experimental data present in the CCCBDB.