100+ datasets found
  1. h

    CPQA-Evaluation-Set

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MERaLiON (2025). CPQA-Evaluation-Set [Dataset]. https://huggingface.co/datasets/MERaLiON/CPQA-Evaluation-Set
    Explore at:
    Dataset updated
    Jun 1, 2025
    Dataset authored and provided by
    MERaLiON
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    CPQA-Eval-LLM set: Evaluation Set for Contextual Paralinguistic Understanding in Speech-LLMs This evaluation set is designed to assess the capability of large speech-language models (Speech-LLMs) to understand contextual and paralinguistic cues in speech. The dataset includes:

    2647 LLM-generated question–answer pairs 479 associated YouTube video links

    The data is provided in Hugging Face Dataset format, with the following structure:

    YouTube video links and their corresponding start/end… See the full description on the dataset page: https://huggingface.co/datasets/MERaLiON/CPQA-Evaluation-Set.

  2. h

    ORKG-training-evaluation-set

    • huggingface.co
    Updated Sep 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moussab Scar (2022). ORKG-training-evaluation-set [Dataset]. https://huggingface.co/datasets/Moussab/ORKG-training-evaluation-set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 22, 2022
    Authors
    Moussab Scar
    License

    https://choosealicense.com/licenses/afl-3.0/https://choosealicense.com/licenses/afl-3.0/

    Description

    Moussab/ORKG-training-evaluation-set dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. R

    Data from: IZA Evaluation Dataset Survey

    • ed.iza.org
    • dataverse.iza.org
    docx, zip
    Updated Oct 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Arni; Marco Caliendo; Steffen Künn; Klaus F. Zimmermann; Patrick Arni; Marco Caliendo; Steffen Künn; Klaus F. Zimmermann (2023). IZA Evaluation Dataset Survey [Dataset]. http://doi.org/10.15185/izadp.7971.1
    Explore at:
    docx(44055), zip(16669702)Available download formats
    Dataset updated
    Oct 20, 2023
    Dataset provided by
    Research Data Center of IZA (IDSC)
    Authors
    Patrick Arni; Marco Caliendo; Steffen Künn; Klaus F. Zimmermann; Patrick Arni; Marco Caliendo; Steffen Künn; Klaus F. Zimmermann
    License

    https://www.iza.org/wc/dataverse/IIL-1.0.pdfhttps://www.iza.org/wc/dataverse/IIL-1.0.pdf

    Time period covered
    2007 - 2011
    Area covered
    Germany, Federal States
    Description

    The IZA Evaluation Dataset Survey (IZA ED) was developed in order to obtain reliable longitudinal estimates for the impact of Active Labor Market Policies (ALMP). Moreover, it is suitable for studying the processes of job search and labor market reintegration. The data allow analyzing dynamics with respect to a rich set of individual and labor market characteristics. It covers the initial period of unemployment as well as long-term outcomes, for a total period of up to 3 years after unemployment entry. A longitudinal questionnaire records monthly labor market activities and their duration in detail for the mentioned period. These activities are, for example, employment, unemployment, ALMP, other training etc. Available information covers employment status, occupation, sector, and related earnings, hours, unemployment benefits or other transfer payments. A cross-sectional questionnaire contains all basic information including the process of entering into unemployment, and demographics. The entry into unemployment describes detailed job search behavior such as search intensity, search channels and the role of the Employment Agency. Moreover, reservation wages and individual expectations about leaving unemployment or participating in ALMP programs are recorded. The available demographic information covers employment status, occupation and sector, as well as specifics about citizenship and ethnic background, educational levels, number and age of children, household structure and income, family background, health status, and workplace as well as place of residence regions. The survey provides as well detailed information about the treatment by the unemployment insurance authorities, imposed labor market policies, benefit receipt and sanctions. The survey focuses additionally on individual characteristics and behavior. Such co-variates of individuals comprise social networks, ethnic and migration background, relations and identity, personality traits, cognitive and non-cognitive skills, life and job satisfaction, risky behavior, attitudes and preferences. The main advantages of the IZA ED are the large sample size of unemployed individuals, the accuracy of employment histories, the innovative and rich set of individual co-variates and the fact that the survey measures important characteristics shortly after entry into unemployment.

  4. d

    Data from: Data Sets for Evaluation of Building Fault Detection and...

    • catalog.data.gov
    • data.openei.org
    • +1more
    Updated Apr 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lawrence Berkeley National Laboratory (2022). Data Sets for Evaluation of Building Fault Detection and Diagnostics Algorithms [Dataset]. https://catalog.data.gov/dataset/data-sets-for-evaluation-of-building-fault-detection-and-diagnostics-algorithms-2de50
    Explore at:
    Dataset updated
    Apr 26, 2022
    Dataset provided by
    Lawrence Berkeley National Laboratory
    Description

    This documentation and dataset can be used to test the performance of automated fault detection and diagnostics algorithms for buildings. The dataset was created by LBNL, PNNL, NREL, ORNL and ASHRAE RP-1312 (Drexel University). It includes data for air-handling units and rooftop units simulated with PNNL's large office building model.

  5. R

    Evaluation Set Cone Detection Dataset

    • universe.roboflow.com
    zip
    Updated Feb 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cone Dataset Training (2023). Evaluation Set Cone Detection Dataset [Dataset]. https://universe.roboflow.com/cone-dataset-training/evaluation-set-cone-detection
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 18, 2023
    Dataset authored and provided by
    Cone Dataset Training
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Traffic Cones Bounding Boxes
    Description

    Evaluation Set Cone Detection

    ## Overview
    
    Evaluation Set Cone Detection is a dataset for object detection tasks - it contains Traffic Cones annotations for 518 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  6. Measure Evaluation

    • catalog.data.gov
    • gimi9.com
    • +1more
    Updated Jun 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.usaid.gov (2024). Measure Evaluation [Dataset]. https://catalog.data.gov/dataset/measure-evaluation
    Explore at:
    Dataset updated
    Jun 8, 2024
    Dataset provided by
    United States Agency for International Developmenthttp://usaid.gov/
    Description

    MEASURE Evaluation is the USAID Global Health Bureau's primary vehicle for supporting improvements in monitoring and evaluation in population, health and nutrition worldwide. They help to identify data needs, collect and analyze technically sound data, and use that data for health decision making. Some MEASURE Evaluation activities involve the collection of innovative evaluation data sets in order to increase the evidence-base on program impact and evaluate the strengths and weaknesses of recent evaluation methodological developments. Many of these data sets may be available to other researchers to answer questions of particular importance to global health and evaluation research. Some of these data sets are being added to the Dataverse on a rolling basis, as they become available. This collection on the Dataverse platform contains a growing variety and number of global health evaluation datasets.

  7. Evaluation of the Immobilization of Heavy Metals Data Set

    • catalog.data.gov
    • datasets.ai
    Updated Sep 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2022). Evaluation of the Immobilization of Heavy Metals Data Set [Dataset]. https://catalog.data.gov/dataset/evaluation-of-the-immobilization-of-heavy-metals-data-set
    Explore at:
    Dataset updated
    Sep 2, 2022
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Data is for 9 figures that include raw analyses of heavy metal experiment results including, screening batch tests, FTIR, XRD, pH effects Kinetic batch tests, Isotherm modeling, (Langmuir, Freundlich, Redlich-Peterson), and Column tests. This dataset is associated with the following publication: Wallace, A., C. Su, M. Sexton, and W. Sun. Evaluation of the Immobilization of Co-Existing Heavy Metal Ions of Pb2+, Cd2+, and Zn2+ from Water by Dairy Manure-Derived Biochar: Performance and Reusability. JOURNAL OF ENVIRONMENTAL ENGINEERING. American Society of Civil Engineers (ASCE), Reston, VA, USA, 148(6): 04022021, (2022).

  8. Student evaluation of teaching data 2009-2010

    • researchdata.edu.au
    • dro.deakin.edu.au
    Updated Jun 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stuart Rohan Palmer; Dr Stuart Palmer; A/Prof Stuart Palmer (2024). Student evaluation of teaching data 2009-2010 [Dataset]. http://doi.org/10.26187/DEAKIN.25807732.V1
    Explore at:
    Dataset updated
    Jun 5, 2024
    Dataset provided by
    Deakin Universityhttp://www.deakin.edu.au/
    Authors
    Stuart Rohan Palmer; Dr Stuart Palmer; A/Prof Stuart Palmer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of a summary of publicly available student evaluation of teaching (SET) data for the annual period of trimester 2 2009, trimester 3 2009/2010 and trimester 1 2010, from the Student Evaluation of Teaching and Units (SETU) portal.

    The data was analysed to include mean rating sets for 1432 units of study, and represented 74498 sets of SETU ratings, 188391 individual student enrolements and 58.5 percent of all units listed in the Deakin University handbook for the period under consideration, to identify any systematic influences on SET results at Deakin University.

    The data reported for a unit included:
    • total enrolment;
    • total number of responses; and
    • computed response rate for the enrolment location(s) selected

    And, the data reported for each of the ten core SETU items included:
    • number of responses;
    • mean rating;
    • standard deviation of the mean rating;
    • percentage agreement;
    • percentage disagreement; and
    • percentage difference.

  9. Z

    ACHILLES: Ancient and Historical Language Evaluation Set

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dereza, Oksana (2024). ACHILLES: Ancient and Historical Language Evaluation Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10655060
    Explore at:
    Dataset updated
    May 29, 2024
    Dataset authored and provided by
    Dereza, Oksana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset used in the SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages. The task included four problems; problems 1-3 were offered in both constrained and unconstrained tracks on CodaLab, while problem 4 was only a part of the unconstrained track.

    POS-tagging

    Lemmatisation

    Morphological feature prediction

    Mask filling

    Word-level

    Character level

    For problems 1-3, data from Universal Dependencies v.2.12 was used for Ancient Greek, Ancient Hebrew, Classical Chinese, Coptic, Gothic, medieval Icelandic, Latin, Old Church Slavonic, Old East Slavic, Old French and Vedic Sanskrit. Old Hungarian texts, annotated to the same standard as UD corpora, were added to the dataset from the MGTSZ website. In Old Hungarian data, tokens which were POS-tagged PUNCT were altered so that the form matched the lemma to simplify complex punctuation marks used to approximate manuscript symbols; otherwise, no characters were changed.

    As the ISO 639-3 standard does not distinguish between historical stages of Latin, as it does between other languages like Irish, but it was desirable to approximate this distinction for Latin, we further split Latin data. This resulted in two Latin datasets: Classical and Late Latin, and Medieval Latin. This split was dictated by the composition of the Perseus and PROIEL treebanks that served as a source for Latin UD treebanks.

    Historical forms of Irish were only included in mask filling challenges (problem 4), as the quantity of historical Irish text data which has been tokenised and annotated to a single standard to date is insufficient for the purpose of training models to perform morphological analysis tasks. The texts were drawn from CELT, Corpas Stairiúil na Gaeilge, and digital editions of the St. Gall glosses and the Würzburg glosses. Each Irish text taken from CELT is labelled "Old", "Middle" or "Early Modern" in accordance with the language labels provided in CELT metadata. Because CELT metadata relating to language stages and text dating is reliant on information provided by a variety of different editors of earlier print editions, this metadata can be inconsistent across the corpus and on occasion inaccurate. To mitigate complications arising from this, texts drawn from CELT were included in the dataset only if they had a single Irish language label and if the dates provided in CELT metadata for the text match the expected dates for the given period in the history of the Irish language.

    The upper temporal boundary was set at 1700 CE, and texts created later than this date were not included in the dataset. The choice of this date is driven by the fact that most of the historical language data used in word embedding research dates back to the 18th century CE or later, and our intention was to focus on the more challenging and yet unaddressed data. The resulting datasets for each language were then shuffled at the sentence level and split into training, validation and test subsets at the ratio of 0.8 : 0.1 : 0.1.

    A detailed list of text sources for each language in the dataset, as well as other metadata and the description of data formats used for each problem, is provided on the Shared Task's GitHub. The structure of the dataset is as follows:

    📂 morphology (data for problems 1-3) ├── 📂 test ├── 📂 ref (reference data used in CodaLab competitions) ├── 📂 lemmatisation ├── 📂 morph_features └── 📂 pos_tagging └── 📂 src (source test data with labels) ├── 📂 train └── 📂 valid📂 fill_mask_word (data for problem 4a) ├── 📂 test ├── 📂 ref (reference data used in CodaLab competitions) └── 📂 src (source test data with labels in 2 different formats) ├── 📂 json └── 📂 tsv ├── 📂 train (train data in 2 different formats) ├── 📂 json └── 📂 tsv └── 📂 valid (validation data in 2 different formats) ├── 📂 json └── 📂 tsv📂 fill_mask_char (data for problem 4b)

    ├── 📂 test ├── 📂 ref (reference data used in CodaLab competitions) └── 📂 src (source test data with labels in 2 different formats) ├── 📂 json └── 📂 tsv ├── 📂 train (train data in 2 different formats) ├── 📂 json └── 📂 tsv └── 📂 valid (validation data in 2 different formats) ├── 📂 json └── 📂 tsv

    We would like to thank Ekaterina Melnikova for suggesting the name for the dataset.

  10. u

    Optimization and Evaluation Datasets for PiMine

    • fdr.uni-hamburg.de
    md, zip
    Updated Jan 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Graef, Joel; Ehrt, Christiane; Reim, Thorben; Rarey, Matthias (2024). Optimization and Evaluation Datasets for PiMine [Dataset]. http://doi.org/10.25592/uhhfdm.13972
    Explore at:
    md, zipAvailable download formats
    Dataset updated
    Jan 22, 2024
    Dataset provided by
    ZBH Center for Bioinformatics, Universität Hamburg, Bundesstraße 43, 20146 Hamburg, Germany
    Authors
    Graef, Joel; Ehrt, Christiane; Reim, Thorben; Rarey, Matthias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The protein-protein interface comparison software PiMine was developed to provide fast comparisons against databases of known protein-protein complex structures. Its application domains range from the prediction of interfaces and potential interaction partners to the identification of potential small molecule modulators of protein-protein interactions.[1]

    The protein-protein evaluation datasets are a collection of five datasets that were used for the parameter optimization (ParamOptSet), enrichment assessment (Dimer597 set, Keskin set, PiMineSet), and runtime analyses (RunTimeSet) of protein-protein interface comparison tools. The evaluation datasets contain pairs of interfaces of protein chains that either share sequential and structural similarities or are even sequentially and structurally unrelated. They enable comparative benchmark studies for tools designed to identify interface similarities.

    In addition, we added the results of the case studies analyzed in [1] to enable readers to follow the discussion and investigate the results individually.

    Data Set description:

    The ParamOptSet was designed based on a study on improving the benchmark datasets for the evaluation of protein-protein docking tools [2]. It was used to optimize and fine-tune the geometric search parameters of PiMine.

    The Dimer597 [3] and Keskin [4] sets were developed earlier. We used them to evaluate PiMine’s performance in identifying structurally and sequentially related interface pairs as well as interface pairs with prominent similarity whose constituting chains are sequentially unrelated.

    The PiMine set [1] was constructed to assess different quality criteria for reliable interface comparison. It consists of similar pairs of protein-protein complexes of which two chains are sequentially and structurally highly related while the other two chains are unrelated and show different folds. It enables the assessment of the performance when the interfaces of apparently unrelated chains are available only. Furthermore, we could obtain reliable interface-interface alignments based on the similar chains which can be used for alignment performance assessments.

    Finally, the RunTimeSet [1] comprises protein-protein complexes from the PDB that were predicted to be biologically relevant. It enables the comparison of typical run times of comparison methods and represents also an interesting dataset to screen for interface similarities.

    References:

    [1] Graef, J.; Ehrt, C.; Reim, T.; Rarey, M. Database-driven identification of structurally similar protein-protein interfaces (submitted)
    [2] Barradas-Bautista, D.; Almajed, A.; Oliva, R.; Kalnis, P.; Cavallo, L. Improving classification of correct and incorrect protein-protein docking models by augmenting the training set. Bioinform. Adv. 2023, 3, vbad012.
    [3] Gao, M.; Skolnick, J. iAlign: a method for the structural comparison of protein–protein interfaces. Bioinformatics 2010, 26, 2259-2265.
    [4] Keskin, O.; Tsai, C.-J.; Wolfson, H.; Nussinov, R. A new, structurally nonredundant, diverse data set of protein–protein interfaces and its implications. Protein Sci. 2004, 13, 1043-1055.

  11. E

    Multi-Language Vocabulary Evaluation Data Set

    • live.european-language-grid.eu
    tsv
    Updated Jan 5, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Multi-Language Vocabulary Evaluation Data Set [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/9487
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Jan 5, 2022
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Multi-Language Vocabulary Evaluation Data Set (MuLVE) is a data set consisting of vocabulary cards and real-life user answers, labeled whether the user answer is correct or incorrect. The data's source is user learning data from the Phase6 vocabulary trainer. The data set contains vocabulary questions in German and English, Spanish, and French as target language and is available in four different variations regarding pre-processing and deduplication.

    It is split up into four tab-separated files, one for each variation, per train and test set. The files include the following columns:

    • cardId - numeric card ID
    • question - vocabulary card question
    • answer - vocabulary card answer
    • userAnswer - answer the user input
    • Label - True if user answer is correct, False if not
    • language - target language (English, French or Spanish)

    The processed data set variations do not include the include \textbf{userAnswer} columns but the following additional columns:

    • question_norm - question normalized
    • answer_norm - answer normalized
    • userAnswer_norm - user answer normalized

  12. Extended evaluation datasets for IIoT TestBed (Release version 1 from...

    • zenodo.org
    bin
    Updated Dec 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Heik; David Heik (2023). Extended evaluation datasets for IIoT TestBed (Release version 1 from November 28th, 2023) [Dataset]. http://doi.org/10.5281/zenodo.10360358
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 12, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Heik; David Heik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This contribution serves as an extension of the existing evaluation data set and thus also for the evaluation of planning strategies for the IIoT test bed environment, which was presented here: https://zenodo.org/records/10212298.

    This evaluation data set presented here comprises a total of four different files, each containing 1000 entries and thus describing 1000 different initial situations.

    The objective of this dataset is to assess and compare the policies achieved with different algorithms.

    The first data set (#1) is identical to the data set already published in https://zenodo.org/records/10212319. The same environmental conditions apply here as in the training, however the situations for the agents are unseen.

    In the second evaluation data set (#2), the number of carriers used at the same time was increased by 25%, i.e. from 16 to 20.

    In evaluation data set three (#3), the number of products to be completed varies between 50 and 500.

    In evaluation data set four (#4), the lot size was limited to 1, which means that each order only includes one product. This increases the diservity (product types and families) that are manufactured at the same time, which can lead to higher set-up efforts.

  13. Input-Output Data Sets Used in the Evaluation of the Two-Layer Soil Moisture...

    • s.cnmilf.com
    • catalog.data.gov
    Updated Mar 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2023). Input-Output Data Sets Used in the Evaluation of the Two-Layer Soil Moisture and Flux Model [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/input-output-data-sets-used-in-the-evaluation-of-the-two-layer-soil-moisture-and-flux-mode
    Explore at:
    Dataset updated
    Mar 3, 2023
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The Excel file contains the model input-out data sets that where used to evaluate the two-layer soil moisture and flux dynamics model. The model is original and was developed by Dr. Hantush by integrating the well-known Richards equation over the root layer and the lower vadose zone. The input-output data are used for: 1) the numerical scheme verification by comparison against HYDRUS model as a benchmark; 2) model validation by comparison against real site data; and 3) for the estimation of model predictive uncertainty and sources of modeling errors. This dataset is associated with the following publication: He, J., M.M. Hantush, L. Kalin, and S. Isik. Two-Layer numerical model of soil moisture dynamics: Model assessment and Bayesian uncertainty estimation. JOURNAL OF HYDROLOGY. Elsevier Science Ltd, New York, NY, USA, 613 part A: 128327, (2022).

  14. TREC 2022 Deep Learning test collection

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated May 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2023). TREC 2022 Deep Learning test collection [Dataset]. https://catalog.data.gov/dataset/trec-2022-deep-learning-test-collection
    Explore at:
    Dataset updated
    May 9, 2023
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    This is a test collection for passage and document retrieval, produced in the TREC 2023 Deep Learning track. The Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not hundreds of thousands or more. This corresponds to real-world scenarios such as training based on click logs and training based on labels from shallow pools (such as the pooling in the TREC Million Query Track or the evaluation of search engines based on early precision).Certain machine learning based methods, such as methods based on deep learning are known to require very large datasets for training. Lack of such large scale datasets has been a limitation for developing such methods for common information retrieval tasks, such as document ranking. The Deep Learning Track organized in the previous years aimed at providing large scale datasets to TREC, and create a focused research effort with a rigorous blind evaluation of ranker for the passage ranking and document ranking tasks.Similar to the previous years, one of the main goals of the track in 2022 is to study what methods work best when a large amount of training data is available. For example, do the same methods that work on small data also work on large data? How much do methods improve when given more training data? What external data and models can be brought in to bear in this scenario, and how useful is it to combine full supervision with other forms of supervision?The collection contains 12 million web pages, 138 million passages from those web pages, search queries, and relevance judgments for the queries.

  15. Z

    GAED: Game Acceptance Evaluation Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Augusto de Castro Vieira (2020). GAED: Game Acceptance Evaluation Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3403237
    Explore at:
    Dataset updated
    Feb 18, 2020
    Dataset provided by
    Wladmir Cardoso Brandão
    Augusto de Castro Vieira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Game Acceptance Evaluation Dataset (GAED) contains statistical data aswell training and validation sets used in our experiments on Neural Networks to evaluate Video Games Acceptance.

    Please consider citing the following references if you found this dataset useful:

    [1] Augusto de Castro Vieira, Wladmir Cardoso Brandão. Evaluating Acceptance of Video Games using Convolutional Neural Networks for Sentiment Analysis of User Reviews. In Proceedings of the 30th ACM Conference on Hypertext and Social Media. 2019.

    [2] Augusto de Castro Vieira, Wladmir Cardoso Brandão. GA-Eval: A Neural Network based approach to evaluate Video Games Acceptance. In Proceedings of the 18th Brazilian Symposium on Computer Games and Digital Entertainment. 2019.

    [3] Augusto de Castro Vieira, Wladmir Cardoso Brandão. (2019). GAED: The Game Acceptance Evaluation Dataset (Version 1.0) [Data set]. Zenodo.

  16. Evaluation set DCASE 2021 task 4 (for submissions)

    • zenodo.org
    application/gzip, bin +1
    Updated Sep 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ronchini, Francesca; Turpault, Nicolas; Serizel, Romain; Wisdom, Scott; Erdogan, Hakan; Hershey, John; Salamon, Justin; Seetharaman, Prem; Fonseca, Eduardo; Cornell, Samuele; Ellis, Daniel P. W.; Ronchini, Francesca; Turpault, Nicolas; Serizel, Romain; Wisdom, Scott; Erdogan, Hakan; Hershey, John; Salamon, Justin; Seetharaman, Prem; Fonseca, Eduardo; Cornell, Samuele; Ellis, Daniel P. W. (2021). Evaluation set DCASE 2021 task 4 (for submissions) [Dataset]. http://doi.org/10.5281/zenodo.5524373
    Explore at:
    zip, application/gzip, binAvailable download formats
    Dataset updated
    Sep 23, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ronchini, Francesca; Turpault, Nicolas; Serizel, Romain; Wisdom, Scott; Erdogan, Hakan; Hershey, John; Salamon, Justin; Seetharaman, Prem; Fonseca, Eduardo; Cornell, Samuele; Ellis, Daniel P. W.; Ronchini, Francesca; Turpault, Nicolas; Serizel, Romain; Wisdom, Scott; Erdogan, Hakan; Hershey, John; Salamon, Justin; Seetharaman, Prem; Fonseca, Eduardo; Cornell, Samuele; Ellis, Daniel P. W.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repo contains the dataset to download to submit results and be evaluated in task 4 of DCASE 2021. It also contains the ground-truth for the public and synthetic evaluation dataset, together with the mapping file between the anonymized (official eval) file names and the files name as presented in the annotations.

    Please, check the submission package in order to follow the instruction to have a submission.

    If you are having problems downloading the large eval21.tg.gz file you can alternatively download the eval21.part* files that contain the same audio files. You will just have to use first cat eval21.part* > eval21.tar.gz and extract the archive file.

  17. h

    Evaluation data set for the GravelSensor

    • rodare.hzdr.de
    md, zip
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bieberle, André; Schleicher, Eckhard (2024). Evaluation data set for the GravelSensor [Dataset]. http://doi.org/10.14278/rodare.3090
    Explore at:
    md, zipAvailable download formats
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    HZDR
    Authors
    Bieberle, André; Schleicher, Eckhard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this study, we used gamma-ray computed tomography (GammaCT) as reference measurement system to evaluate a novel, non-destructive, smart gravel sensor that is based on the well-known wire-mesh sensor. Various sediment fillings with different infiltrating particle sizes are applied to the gravel sensor and the generated particle holdup is locally determined with both measurement systems simultaneously.

  18. o

    GouDa - Generation of universal Data Sets

    • explore.openaire.eu
    • data.niaid.nih.gov
    • +1more
    Updated May 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valerie Restat; Gerrit Boerner; André Conrad; Uta Störl (2022). GouDa - Generation of universal Data Sets [Dataset]. http://doi.org/10.5281/zenodo.6610025
    Explore at:
    Dataset updated
    May 6, 2022
    Authors
    Valerie Restat; Gerrit Boerner; André Conrad; Uta Störl
    Description

    GouDa is a tool for the generation of universal data sets to evaluate and compare existing data preparation tools and new research approaches. It supports diverse error types and arbitrary error rates. Ground truth is provided as well. It thus permits better analysis and evaluation of data preparation pipelines and simplifies the reproducibility of results. Publication: V. Restat, G. Boerner, A. Conrad, and U. Störl. GouDa - Generation of universal Data Sets. In Proceedings of Data Management for End-to-End Machine Learning (DEEM’22), Philadelphia, USA, 2022. https://doi.org/10.1145/3533028.3533311 Please use the current version 1.1.0!

  19. Evaluation of Preformed Monochloramine Reactivity with Processed Natural...

    • catalog.data.gov
    Updated Dec 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2022). Evaluation of Preformed Monochloramine Reactivity with Processed Natural Organic Matter and Scaling Methodology Development for Concentrated Waters Data Set V1 [Dataset]. https://catalog.data.gov/dataset/evaluation-of-preformed-monochloramine-reactivity-with-processed-natural-organic-matter-an
    Explore at:
    Dataset updated
    Dec 15, 2022
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Data is summarized that was used to create all figures in the manuscript and SI. This dataset is associated with the following publication: Kennicutt, A., P. Rossman, J. Bollman, T. Aho, G. Abulikemu, J. Pressman, and D. Wahman. Evaluation of Preformed Monochloramine Reactivity with Processed Natural Organic Matter and Scaling Methodology Development for Concentrated Waters. ACS ES&T Water. American Chemical Society, Washington, DC, USA, 2(12): 2431-2440, (2022).

  20. Data sets used for program evaluation.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James K. Bonfield; Matthew V. Mahoney (2023). Data sets used for program evaluation. [Dataset]. http://doi.org/10.1371/journal.pone.0059190.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    James K. Bonfield; Matthew V. Mahoney
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data sets used to test the compression tools along with the sequencing platforms that produced them. Length is the average sequence length. Depth is the average genome depth assuming 100% of sequences in the data set can be aligned.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
MERaLiON (2025). CPQA-Evaluation-Set [Dataset]. https://huggingface.co/datasets/MERaLiON/CPQA-Evaluation-Set

CPQA-Evaluation-Set

MERaLiON/CPQA-Evaluation-Set

Explore at:
Dataset updated
Jun 1, 2025
Dataset authored and provided by
MERaLiON
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

CPQA-Eval-LLM set: Evaluation Set for Contextual Paralinguistic Understanding in Speech-LLMs This evaluation set is designed to assess the capability of large speech-language models (Speech-LLMs) to understand contextual and paralinguistic cues in speech. The dataset includes:

2647 LLM-generated question–answer pairs 479 associated YouTube video links

The data is provided in Hugging Face Dataset format, with the following structure:

YouTube video links and their corresponding start/end… See the full description on the dataset page: https://huggingface.co/datasets/MERaLiON/CPQA-Evaluation-Set.

Search
Clear search
Close search
Google apps
Main menu