Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This repository presents the evaluation data used for ASM. Please refer to this document for more details about the repository.
This documentation and dataset can be used to test the performance of automated fault detection and diagnostics algorithms for buildings. The dataset was created by LBNL, PNNL, NREL, ORNL and ASHRAE RP-1312 (Drexel University). It includes data for air-handling units and rooftop units simulated with PNNL's large office building model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NLU Evaluation Data - English and German
A labeled English and German language multi-domain dataset (21 domains) with 25K user utterances for human-robot interaction. This dataset is collected and annotated for evaluating NLU services and platforms. The detailed paper on this dataset can be found at arXiv.org: Benchmarking Natural Language Understanding Services for building Conversational Agents The dataset builds on the annotated data of the xliuhw/NLU-Evaluation-Data repository.… See the full description on the dataset page: https://huggingface.co/datasets/deutsche-telekom/NLU-Evaluation-Data-en-de.
Dataset Card for Bengali-Prompt-Evaluation-Data
This dataset has been created with Argilla. As shown in the sections below, this dataset can be loaded into Argilla as explained in Load with Argilla, or used directly with the datasets library in Load with datasets.
Dataset Summary
This dataset contains:
A dataset configuration file conforming to the Argilla dataset format named argilla.yaml. This configuration file will be used to configure the dataset when using the… See the full description on the dataset page: https://huggingface.co/datasets/DIBT-Bengali/Bengali-Prompt-Evaluation-Data.
The evaluation employs administrative data from 12 states, covering approximately 160,000 WIA Adult, WIA Dislocated Worker and WIA Youth participants and nearly 3 million comparison group members. Focusing on participants who entered WIA programs between July 2003 and June 2005, the evaluation considers the impact for all those in the program, the impact for those receiving only Core or Intensive Services, and the incremental impact of Training Services. This dataset contains all of the information used to conduct the non-experimental evaluation estimates for the 1) WIA Client Treatment Group and 2) The Unemployment Insurance and Employment Service Client comparison group. The administrative data collected by IMPAQ for the "Workforce Investment Act Non-Experimental Net Impact Evaluation" project were received from state agencies in three segments: annual Workforce Investment Act Standardized Record Data (WIASRD) or closely related files, Unemployment Insurance data, and Unemployment Insurance Wage Record data. The analysis were conducted for twelve states; however, based on the data sharing agreements, the Public Use Data (PUD) set includes data for nine states only. Our agreement for use of these data required that the identity of those states was not revealed. As a result, all geographical identifiers were removed to preserve states' anonymity.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Game Acceptance Evaluation Dataset (GAED) contains statistical data aswell training and validation sets used in our experiments on Neural Networks to evaluate Video Games Acceptance.
Please consider citing the following references if you found this dataset useful:
[1] Augusto de Castro Vieira, Wladmir Cardoso Brandão. Evaluating Acceptance of Video Games using Convolutional Neural Networks for Sentiment Analysis of User Reviews. In Proceedings of the 30th ACM Conference on Hypertext and Social Media. 2019.
[2] Augusto de Castro Vieira, Wladmir Cardoso Brandão. GA-Eval: A Neural Network based approach to evaluate Video Games Acceptance. In Proceedings of the 18th Brazilian Symposium on Computer Games and Digital Entertainment. 2019.
[3] Augusto de Castro Vieira, Wladmir Cardoso Brandão. (2019). GAED: The Game Acceptance Evaluation Dataset (Version 1.0) [Data set]. Zenodo.
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
MEASURE Evaluation is the USAID Global Health Bureau's primary vehicle for supporting improvements in monitoring and evaluation in population, health and nutrition worldwide. They help to identify data needs, collect and analyze technically sound data, and use that data for health decision making. Some MEASURE Evaluation activities involve the collection of innovative evaluation data sets in order to increase the evidence-base on program impact and evaluate the strengths and weaknesses of recent evaluation methodological developments. Many of these data sets may be available to other researchers to answer questions of particular importance to global health and evaluation research. Some of these data sets are being added to the Dataverse on a rolling basis, as they become available. This collection on the Dataverse platform contains a growing variety and number of global health evaluation datasets.
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
GouDa is a tool for the generation of universal data sets to evaluate and compare existing data preparation tools and new research approaches. It supports diverse error types and arbitrary error rates. Ground truth is provided as well. It thus permits better analysis and evaluation of data preparation pipelines and simplifies the reproducibility of results.
Publication: V. Restat, G. Boerner, A. Conrad, and U. Störl. GouDa - Generation of universal Data Sets. In Proceedings of Data Management for End-to-End Machine Learning (DEEM’22), Philadelphia, USA, 2022. https://doi.org/10.1145/3533028.3533311
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Evaluation data for ASDF and Corner Clamp Base Training Data
@article{schieber2024asdf,
title={ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation},
author={Schieber, Hannah and Li, Shiyu and Corell, Niklas and Beckerle, Philipp and Kreimeier, Julian and Roth, Daniel},
journal={arXiv preprint arXiv:2403.16400},
year={2024}
}
This is the first version of the performance evaluation tool. Evaluation is based on point estimates of the RUL predictions. a more detailed documentation will be available with the tool soon. in the meantime, please download the attached files/folder in the same root folder to run a demo. To evaluate your own results create results & application data files in .mat format and save in the results folder. please make sure you name your files as xxx_results.mat for results and yyy_appData.mat for application data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Large Language Model (LLM) evaluation is currently one of the most important areas of research, with existing benchmarks proving to be insufficient and not completely representative of LLMs' various capabilities. We present a curated collection of challenging statements on sensitive topics for LLM benchmarking called TruthEval. These statements were curated by hand and contain known truth values. The categories were chosen to distinguish LLMs' abilities from their stochastic nature. Details of collection method and use cases can be found in this paper: TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
The datasets contain the following parts for Open Media Forensics Challenge (OpenMFC) evaluations: 1. NC16 Kickoff dataset 2. NC17 development and evaluation datasets 3. MFC18 development and evaluation datasets 4. MFC19 development and evaluation datasets 5. MFC20 development and evaluation datasets 6. OpenMFC2022 steg datasets 7. OpenMFC2022 deepfake datasets
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was synthesized with the assistance of Gemini 1.5 Pro for the use of evaluating LLMs with paradoxical logic. This dataset is intended also for use as the seed for a TBD dataset related to practical use of this dataset.
https://data.gov.tw/licensehttps://data.gov.tw/license
Inventory of materialsData classification - Class A: open data. Class B: data with limited use. Class C: non-open data.Current situation - 1. Free to use. 2. Free application. 3. Fee. 4. Not open.Degree of openness - 1. Already open. 2. Scheduled to open (please fill in the scheduled opening date). 3. Cannot be opened (please fill in the reason for not being able to open in the remarks column).
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The public sharing of primary research datasets potentially benefits the research community but is not yet common practice. In this pilot study, we analyzed whether data sharing frequency was associated with funder and publisher requirements, journal impact factor, or investigator experience and impact. Across 397 recent biomedical microarray studies, we found investigators were more likely to publicly share their raw dataset when their study was published in a high-impact journal and when the first or last authors had high levels of career experience and impact. We estimate the USA's National Institutes of Health (NIH) data sharing policy applied to 19% of the studies in our cohort; being subject to the NIH data sharing plan requirement was not found to correlate with increased data sharing behavior in multivariate logistic regression analysis. Studies published in journals that required a database submission accession number as a condition of publication were more likely to share their data, but this trend was not statistically significant. These early results will inform our ongoing larger analysis, and hopefully contribute to the development of more effective data sharing initiatives. Earlier version presented at ASIS&T and ISSI Pre-Conference: Symposium on Informetrics and Scientometrics 2009
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset provides object detection results using five different LiDAR-based object detection algorithms: PointRCNN, SECOND, Part-A², PointPillars, and PVRCNN. The experiments aim to determine the optimal angular resolution for LiDAR-based object detection. The point cloud data was generated in the CARLA simulator, modeled in a suburban scenario featuring 30 vehicles, 13 bicycles, and 40 pedestrians. The angular resolution in the dataset ranges from 0.1° x 0.1° (H x V) to 1.0° x 1.0°, with increments of 0.1° in each direction.
For each angular resolution, over 2000 frames of point clouds were collected, with 1600 of these frames labeled across three object classes—vehicles, pedestrians, and cyclists, for algorithm training purposes The dataset includes detection results after evaluating 1000 frames, with results recorded for the respective angular resolutions.
Each file in the dataset contains five sheets, corresponding to the five different algorithms evaluated. The data structure includes the following columns:
Frame Index: Indicates the frame number, ranging from 1 to 1000.
Object Classification: Labels objects as 1 (Vehicle), 2 (Pedestrian), or 3 (Cyclist).
Confidence Score: Represents the confidence level of the detected object in its bounding box.
Number of LiDAR Points: Indicates the count of LiDAR points within the bounding box.
Bounding Box Distance: Specifies the distance of the bounding box from the LiDAR sensor.
This dataset has been created in the context of the Leibniz Young Investigator Grants- programmed by the Leibniz University Hannover and is funded by the Ministry of Science and Culture of Lower Saxony (MWK) Grant Nr. 11-76251-114/2022
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of a summary of publicly available student evaluation of teaching (SET) data for the annual period of trimester 2 2009, trimester 3 2009/2010 and trimester 1 2010, from the Student Evaluation of Teaching and Units (SETU) portal.The data was analysed to include mean rating sets for 1432 units of study, and represented 74498 sets of SETU ratings, 188391 individual student enrolements and 58.5 percent of all units listed in the Deakin University handbook for the period under consideration, to identify any systematic influences on SET results at Deakin University.The data reported for a unit included:• total enrolment;• total number of responses; and• computed response rate for the enrolment location(s) selectedAnd, the data reported for each of the ten core SETU items included:• number of responses;• mean rating;• standard deviation of the mean rating;• percentage agreement;• percentage disagreement; and• percentage difference.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
This dataset was created to compare and evaluate the Semantic Scholar recommendation service and Open Research Knowledge Graph (ORKG) similar papers recommendation service based on Elastic Search. The dataset includes 30 random ORKG comparisons, each of them is provided with 50 similar papers recommended by Semantic Scholar and 50 papers recommended by Elastic Search, including 10 most relevant papers that were manually labeled.
Average precision (P@k) and recall (R@k) for Semantic Scholar results:
Average precision (P@k) and recall (R@k) for Elastic Search results:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We propose a simple-to-implement panel data method to evaluate the impacts of social policy. The basic idea is to exploit the dependence among cross-sectional units to construct the counterfactuals. The cross-sectional correlations are attributed to the presence of some (unobserved) common factors. However, instead of trying to estimate the unobserved factors, we propose to use observed data. We use a panel of 24 countries to evaluate the impact of political and economic integration of Hong Kong with mainland China. We find that the political integration hardly had any impact on the growth of the Hong Kong economy. However, the economic integration has raised Hong Kong's annual real GDP by about 4%.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Coast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single NPZ zipped file. NPZ files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes us ...
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This repository presents the evaluation data used for ASM. Please refer to this document for more details about the repository.