Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
CPQA-Eval-LLM set: Evaluation Set for Contextual Paralinguistic Understanding in Speech-LLMs This evaluation set is designed to assess the capability of large speech-language models (Speech-LLMs) to understand contextual and paralinguistic cues in speech. The dataset includes:
2647 LLM-generated question–answer pairs 479 associated YouTube video links
The data is provided in Hugging Face Dataset format, with the following structure:
YouTube video links and their corresponding start/end… See the full description on the dataset page: https://huggingface.co/datasets/MERaLiON/CPQA-Evaluation-Set.
https://choosealicense.com/licenses/afl-3.0/https://choosealicense.com/licenses/afl-3.0/
Moussab/ORKG-training-evaluation-set dataset hosted on Hugging Face and contributed by the HF Datasets community
https://www.iza.org/wc/dataverse/IIL-1.0.pdfhttps://www.iza.org/wc/dataverse/IIL-1.0.pdf
The IZA Evaluation Dataset Survey (IZA ED) was developed in order to obtain reliable longitudinal estimates for the impact of Active Labor Market Policies (ALMP). Moreover, it is suitable for studying the processes of job search and labor market reintegration. The data allow analyzing dynamics with respect to a rich set of individual and labor market characteristics. It covers the initial period of unemployment as well as long-term outcomes, for a total period of up to 3 years after unemployment entry. A longitudinal questionnaire records monthly labor market activities and their duration in detail for the mentioned period. These activities are, for example, employment, unemployment, ALMP, other training etc. Available information covers employment status, occupation, sector, and related earnings, hours, unemployment benefits or other transfer payments. A cross-sectional questionnaire contains all basic information including the process of entering into unemployment, and demographics. The entry into unemployment describes detailed job search behavior such as search intensity, search channels and the role of the Employment Agency. Moreover, reservation wages and individual expectations about leaving unemployment or participating in ALMP programs are recorded. The available demographic information covers employment status, occupation and sector, as well as specifics about citizenship and ethnic background, educational levels, number and age of children, household structure and income, family background, health status, and workplace as well as place of residence regions. The survey provides as well detailed information about the treatment by the unemployment insurance authorities, imposed labor market policies, benefit receipt and sanctions. The survey focuses additionally on individual characteristics and behavior. Such co-variates of individuals comprise social networks, ethnic and migration background, relations and identity, personality traits, cognitive and non-cognitive skills, life and job satisfaction, risky behavior, attitudes and preferences. The main advantages of the IZA ED are the large sample size of unemployed individuals, the accuracy of employment histories, the innovative and rich set of individual co-variates and the fact that the survey measures important characteristics shortly after entry into unemployment.
This documentation and dataset can be used to test the performance of automated fault detection and diagnostics algorithms for buildings. The dataset was created by LBNL, PNNL, NREL, ORNL and ASHRAE RP-1312 (Drexel University). It includes data for air-handling units and rooftop units simulated with PNNL's large office building model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Evaluation Set Cone Detection is a dataset for object detection tasks - it contains Traffic Cones annotations for 518 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
MEASURE Evaluation is the USAID Global Health Bureau's primary vehicle for supporting improvements in monitoring and evaluation in population, health and nutrition worldwide. They help to identify data needs, collect and analyze technically sound data, and use that data for health decision making. Some MEASURE Evaluation activities involve the collection of innovative evaluation data sets in order to increase the evidence-base on program impact and evaluate the strengths and weaknesses of recent evaluation methodological developments. Many of these data sets may be available to other researchers to answer questions of particular importance to global health and evaluation research. Some of these data sets are being added to the Dataverse on a rolling basis, as they become available. This collection on the Dataverse platform contains a growing variety and number of global health evaluation datasets.
Data is for 9 figures that include raw analyses of heavy metal experiment results including, screening batch tests, FTIR, XRD, pH effects Kinetic batch tests, Isotherm modeling, (Langmuir, Freundlich, Redlich-Peterson), and Column tests. This dataset is associated with the following publication: Wallace, A., C. Su, M. Sexton, and W. Sun. Evaluation of the Immobilization of Co-Existing Heavy Metal Ions of Pb2+, Cd2+, and Zn2+ from Water by Dairy Manure-Derived Biochar: Performance and Reusability. JOURNAL OF ENVIRONMENTAL ENGINEERING. American Society of Civil Engineers (ASCE), Reston, VA, USA, 148(6): 04022021, (2022).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of a summary of publicly available student evaluation of teaching (SET) data for the annual period of trimester 2 2009, trimester 3 2009/2010 and trimester 1 2010, from the Student Evaluation of Teaching and Units (SETU) portal.
The data was analysed to include mean rating sets for 1432 units of study, and represented 74498 sets of SETU ratings, 188391 individual student enrolements and 58.5 percent of all units listed in the Deakin University handbook for the period under consideration, to identify any systematic influences on SET results at Deakin University.
The data reported for a unit included:
• total enrolment;
• total number of responses; and
• computed response rate for the enrolment location(s) selected
And, the data reported for each of the ten core SETU items included:
• number of responses;
• mean rating;
• standard deviation of the mean rating;
• percentage agreement;
• percentage disagreement; and
• percentage difference.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset used in the SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages. The task included four problems; problems 1-3 were offered in both constrained and unconstrained tracks on CodaLab, while problem 4 was only a part of the unconstrained track.
POS-tagging
Lemmatisation
Morphological feature prediction
Mask filling
Word-level
Character level
For problems 1-3, data from Universal Dependencies v.2.12 was used for Ancient Greek, Ancient Hebrew, Classical Chinese, Coptic, Gothic, medieval Icelandic, Latin, Old Church Slavonic, Old East Slavic, Old French and Vedic Sanskrit. Old Hungarian texts, annotated to the same standard as UD corpora, were added to the dataset from the MGTSZ website. In Old Hungarian data, tokens which were POS-tagged PUNCT were altered so that the form matched the lemma to simplify complex punctuation marks used to approximate manuscript symbols; otherwise, no characters were changed.
As the ISO 639-3 standard does not distinguish between historical stages of Latin, as it does between other languages like Irish, but it was desirable to approximate this distinction for Latin, we further split Latin data. This resulted in two Latin datasets: Classical and Late Latin, and Medieval Latin. This split was dictated by the composition of the Perseus and PROIEL treebanks that served as a source for Latin UD treebanks.
Historical forms of Irish were only included in mask filling challenges (problem 4), as the quantity of historical Irish text data which has been tokenised and annotated to a single standard to date is insufficient for the purpose of training models to perform morphological analysis tasks. The texts were drawn from CELT, Corpas Stairiúil na Gaeilge, and digital editions of the St. Gall glosses and the Würzburg glosses. Each Irish text taken from CELT is labelled "Old", "Middle" or "Early Modern" in accordance with the language labels provided in CELT metadata. Because CELT metadata relating to language stages and text dating is reliant on information provided by a variety of different editors of earlier print editions, this metadata can be inconsistent across the corpus and on occasion inaccurate. To mitigate complications arising from this, texts drawn from CELT were included in the dataset only if they had a single Irish language label and if the dates provided in CELT metadata for the text match the expected dates for the given period in the history of the Irish language.
The upper temporal boundary was set at 1700 CE, and texts created later than this date were not included in the dataset. The choice of this date is driven by the fact that most of the historical language data used in word embedding research dates back to the 18th century CE or later, and our intention was to focus on the more challenging and yet unaddressed data. The resulting datasets for each language were then shuffled at the sentence level and split into training, validation and test subsets at the ratio of 0.8 : 0.1 : 0.1.
A detailed list of text sources for each language in the dataset, as well as other metadata and the description of data formats used for each problem, is provided on the Shared Task's GitHub. The structure of the dataset is as follows:
📂 morphology (data for problems 1-3) ├── 📂 test ├── 📂 ref (reference data used in CodaLab competitions) ├── 📂 lemmatisation ├── 📂 morph_features └── 📂 pos_tagging └── 📂 src (source test data with labels) ├── 📂 train └── 📂 valid📂 fill_mask_word (data for problem 4a) ├── 📂 test ├── 📂 ref (reference data used in CodaLab competitions) └── 📂 src (source test data with labels in 2 different formats) ├── 📂 json └── 📂 tsv ├── 📂 train (train data in 2 different formats) ├── 📂 json └── 📂 tsv └── 📂 valid (validation data in 2 different formats) ├── 📂 json └── 📂 tsv📂 fill_mask_char (data for problem 4b)
├── 📂 test ├── 📂 ref (reference data used in CodaLab competitions) └── 📂 src (source test data with labels in 2 different formats) ├── 📂 json └── 📂 tsv ├── 📂 train (train data in 2 different formats) ├── 📂 json └── 📂 tsv └── 📂 valid (validation data in 2 different formats) ├── 📂 json └── 📂 tsv
We would like to thank Ekaterina Melnikova for suggesting the name for the dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The protein-protein interface comparison software PiMine was developed to provide fast comparisons against databases of known protein-protein complex structures. Its application domains range from the prediction of interfaces and potential interaction partners to the identification of potential small molecule modulators of protein-protein interactions.[1]
The protein-protein evaluation datasets are a collection of five datasets that were used for the parameter optimization (ParamOptSet), enrichment assessment (Dimer597 set, Keskin set, PiMineSet), and runtime analyses (RunTimeSet) of protein-protein interface comparison tools. The evaluation datasets contain pairs of interfaces of protein chains that either share sequential and structural similarities or are even sequentially and structurally unrelated. They enable comparative benchmark studies for tools designed to identify interface similarities.
In addition, we added the results of the case studies analyzed in [1] to enable readers to follow the discussion and investigate the results individually.
Data Set description:
The ParamOptSet was designed based on a study on improving the benchmark datasets for the evaluation of protein-protein docking tools [2]. It was used to optimize and fine-tune the geometric search parameters of PiMine.
The Dimer597 [3] and Keskin [4] sets were developed earlier. We used them to evaluate PiMine’s performance in identifying structurally and sequentially related interface pairs as well as interface pairs with prominent similarity whose constituting chains are sequentially unrelated.
The PiMine set [1] was constructed to assess different quality criteria for reliable interface comparison. It consists of similar pairs of protein-protein complexes of which two chains are sequentially and structurally highly related while the other two chains are unrelated and show different folds. It enables the assessment of the performance when the interfaces of apparently unrelated chains are available only. Furthermore, we could obtain reliable interface-interface alignments based on the similar chains which can be used for alignment performance assessments.
Finally, the RunTimeSet [1] comprises protein-protein complexes from the PDB that were predicted to be biologically relevant. It enables the comparison of typical run times of comparison methods and represents also an interesting dataset to screen for interface similarities.
References:
[1] Graef, J.; Ehrt, C.; Reim, T.; Rarey, M. Database-driven identification of structurally similar protein-protein interfaces (submitted)
[2] Barradas-Bautista, D.; Almajed, A.; Oliva, R.; Kalnis, P.; Cavallo, L. Improving classification of correct and incorrect protein-protein docking models by augmenting the training set. Bioinform. Adv. 2023, 3, vbad012.
[3] Gao, M.; Skolnick, J. iAlign: a method for the structural comparison of protein–protein interfaces. Bioinformatics 2010, 26, 2259-2265.
[4] Keskin, O.; Tsai, C.-J.; Wolfson, H.; Nussinov, R. A new, structurally nonredundant, diverse data set of protein–protein interfaces and its implications. Protein Sci. 2004, 13, 1043-1055.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Multi-Language Vocabulary Evaluation Data Set (MuLVE) is a data set consisting of vocabulary cards and real-life user answers, labeled whether the user answer is correct or incorrect. The data's source is user learning data from the Phase6 vocabulary trainer. The data set contains vocabulary questions in German and English, Spanish, and French as target language and is available in four different variations regarding pre-processing and deduplication.
It is split up into four tab-separated files, one for each variation, per train and test set. The files include the following columns:
The processed data set variations do not include the include \textbf{userAnswer} columns but the following additional columns:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This contribution serves as an extension of the existing evaluation data set and thus also for the evaluation of planning strategies for the IIoT test bed environment, which was presented here: https://zenodo.org/records/10212298.
This evaluation data set presented here comprises a total of four different files, each containing 1000 entries and thus describing 1000 different initial situations.
The objective of this dataset is to assess and compare the policies achieved with different algorithms.
The first data set (#1) is identical to the data set already published in https://zenodo.org/records/10212319. The same environmental conditions apply here as in the training, however the situations for the agents are unseen.
In the second evaluation data set (#2), the number of carriers used at the same time was increased by 25%, i.e. from 16 to 20.
In evaluation data set three (#3), the number of products to be completed varies between 50 and 500.
In evaluation data set four (#4), the lot size was limited to 1, which means that each order only includes one product. This increases the diservity (product types and families) that are manufactured at the same time, which can lead to higher set-up efforts.
The Excel file contains the model input-out data sets that where used to evaluate the two-layer soil moisture and flux dynamics model. The model is original and was developed by Dr. Hantush by integrating the well-known Richards equation over the root layer and the lower vadose zone. The input-output data are used for: 1) the numerical scheme verification by comparison against HYDRUS model as a benchmark; 2) model validation by comparison against real site data; and 3) for the estimation of model predictive uncertainty and sources of modeling errors. This dataset is associated with the following publication: He, J., M.M. Hantush, L. Kalin, and S. Isik. Two-Layer numerical model of soil moisture dynamics: Model assessment and Bayesian uncertainty estimation. JOURNAL OF HYDROLOGY. Elsevier Science Ltd, New York, NY, USA, 613 part A: 128327, (2022).
This is a test collection for passage and document retrieval, produced in the TREC 2023 Deep Learning track. The Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not hundreds of thousands or more. This corresponds to real-world scenarios such as training based on click logs and training based on labels from shallow pools (such as the pooling in the TREC Million Query Track or the evaluation of search engines based on early precision).Certain machine learning based methods, such as methods based on deep learning are known to require very large datasets for training. Lack of such large scale datasets has been a limitation for developing such methods for common information retrieval tasks, such as document ranking. The Deep Learning Track organized in the previous years aimed at providing large scale datasets to TREC, and create a focused research effort with a rigorous blind evaluation of ranker for the passage ranking and document ranking tasks.Similar to the previous years, one of the main goals of the track in 2022 is to study what methods work best when a large amount of training data is available. For example, do the same methods that work on small data also work on large data? How much do methods improve when given more training data? What external data and models can be brought in to bear in this scenario, and how useful is it to combine full supervision with other forms of supervision?The collection contains 12 million web pages, 138 million passages from those web pages, search queries, and relevance judgments for the queries.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Game Acceptance Evaluation Dataset (GAED) contains statistical data aswell training and validation sets used in our experiments on Neural Networks to evaluate Video Games Acceptance.
Please consider citing the following references if you found this dataset useful:
[1] Augusto de Castro Vieira, Wladmir Cardoso Brandão. Evaluating Acceptance of Video Games using Convolutional Neural Networks for Sentiment Analysis of User Reviews. In Proceedings of the 30th ACM Conference on Hypertext and Social Media. 2019.
[2] Augusto de Castro Vieira, Wladmir Cardoso Brandão. GA-Eval: A Neural Network based approach to evaluate Video Games Acceptance. In Proceedings of the 18th Brazilian Symposium on Computer Games and Digital Entertainment. 2019.
[3] Augusto de Castro Vieira, Wladmir Cardoso Brandão. (2019). GAED: The Game Acceptance Evaluation Dataset (Version 1.0) [Data set]. Zenodo.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repo contains the dataset to download to submit results and be evaluated in task 4 of DCASE 2021. It also contains the ground-truth for the public and synthetic evaluation dataset, together with the mapping file between the anonymized (official eval) file names and the files name as presented in the annotations.
Please, check the submission package in order to follow the instruction to have a submission.
If you are having problems downloading the large eval21.tg.gz file you can alternatively download the eval21.part* files that contain the same audio files. You will just have to use first cat eval21.part* > eval21.tar.gz
and extract the archive file.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this study, we used gamma-ray computed tomography (GammaCT) as reference measurement system to evaluate a novel, non-destructive, smart gravel sensor that is based on the well-known wire-mesh sensor. Various sediment fillings with different infiltrating particle sizes are applied to the gravel sensor and the generated particle holdup is locally determined with both measurement systems simultaneously.
GouDa is a tool for the generation of universal data sets to evaluate and compare existing data preparation tools and new research approaches. It supports diverse error types and arbitrary error rates. Ground truth is provided as well. It thus permits better analysis and evaluation of data preparation pipelines and simplifies the reproducibility of results. Publication: V. Restat, G. Boerner, A. Conrad, and U. Störl. GouDa - Generation of universal Data Sets. In Proceedings of Data Management for End-to-End Machine Learning (DEEM’22), Philadelphia, USA, 2022. https://doi.org/10.1145/3533028.3533311 Please use the current version 1.1.0!
Data is summarized that was used to create all figures in the manuscript and SI. This dataset is associated with the following publication: Kennicutt, A., P. Rossman, J. Bollman, T. Aho, G. Abulikemu, J. Pressman, and D. Wahman. Evaluation of Preformed Monochloramine Reactivity with Processed Natural Organic Matter and Scaling Methodology Development for Concentrated Waters. ACS ES&T Water. American Chemical Society, Washington, DC, USA, 2(12): 2431-2440, (2022).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data sets used to test the compression tools along with the sequencing platforms that produced them. Length is the average sequence length. Depth is the average genome depth assuming 100% of sequences in the data set can be aligned.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
CPQA-Eval-LLM set: Evaluation Set for Contextual Paralinguistic Understanding in Speech-LLMs This evaluation set is designed to assess the capability of large speech-language models (Speech-LLMs) to understand contextual and paralinguistic cues in speech. The dataset includes:
2647 LLM-generated question–answer pairs 479 associated YouTube video links
The data is provided in Hugging Face Dataset format, with the following structure:
YouTube video links and their corresponding start/end… See the full description on the dataset page: https://huggingface.co/datasets/MERaLiON/CPQA-Evaluation-Set.