Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Abstract: In our report, we described a case study where we evaluate our approach for correctness quantification using Palladio and KeY. Here, we publish the corresponding Palladio Component Model files as well as the generated source code, in order to make them publicly available. TechnicalRemarks: # README: Case Study (Age of Maturity) This archive contains the Palladio artifacts of the case study "AgeOfMaturity" of the Technical-Report "Model-driven Quantification of Correctness with Palladio and KeY" (DOI: 10.5445/IR/1000128855). The structure of this archive is as follows: The folder "PalladioModels" contains the Palladio Component Model Eclipse Project of the "AgeOfMaturity" use case The folder "DependencySolverResults" contains output files with the results of the solved "AgeOfMaturity" case study as described in the report The folder "GeneratedJavaCode" contains Java-Code generated with the PCM2Java Project
Helpsteer-correctness
This dataset is derived from NVIDIA's HelpSteer dataset, processed specifically for preference learning on the correctness dimension.
- Train split: 27417 examples
- Test split: 1416 examples
## Format
Each example contains the following fields:
- `prompt`: Question with "Human:" prefix and "Assistant:" suffix
- `chosen`: The response with higher correctness score
- `rejected`: The response with lower correctness score
-… See the full description on the dataset page: https://huggingface.co/datasets/cheryyunl/helpsteer-correctness.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
White-box test generator tools rely only on the code under test to select test inputs, and capture the implementation's output as assertions. If there is a fault in the implementation, it could get encoded in the generated tests. Tool evaluations usually measure fault-detection capability using the number of such fault-encoding tests. However, these faults are only detected, if the developer can recognize that the encoded behavior is faulty. We designed an exploratory study to investigate how developers perform in classifying generated white-box test as faulty or correct. We carried out the study in a laboratory setting with 54 graduate students. The tests were generated for two open-source projects with the help of the IntelliTest tool. The performance of the participants were analyzed using binary classification metrics and by coding their observed activities. The results showed that participants incorrectly classified a large number of both fault-encoding and correct tests (with median misclassification rate 33% and 25% respectively). Thus the real fault-detection capability of test generators could be much lower than typically reported, and we suggest to take this human factor into account when evaluating generated white-box tests.
This material contains the dataset and the recorded videos of the study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Some MGF datasets used for reference for testing porpouses.
Stores information for the Transaction Accuracy Review (TAR) which measures the adjudicative accuracy of the approximately 3.9 million Title II claims processed each year.
Software developers are increasingly encountering artificial intelligence (AI) tools in their workflows, but opinions on their accuracy remain divided. According to a 2024 Stack Overflow developer survey, 43 percent of programmers either highly or somewhat trust the output from AI tools, while 30 percent express distrust. This split in trust levels highlights the ongoing debate about AI's role and reliability in software development. Productivity gains drive AI adoption Despite mixed feelings about accuracy, AI tools are making significant impacts on developer productivity. The same survey found that over 80 percent of developers cited improved productivity as the most important benefit of using AI tools, a substantial increase from 33 percent the previous year. This productivity boost is further evidenced by a study showing that developers using AI co-pilots completed tasks 56 percent faster than those without AI assistance. Challenges in AI implementation While AI shows promise in handling complex tasks, with nearly one in three developers reporting its effectiveness in this area, challenges remain. Distrust of AI-generated output was identified as the greatest obstacle to AI adoption in development workflows by two-thirds of developers worldwide. Additionally, 30 percent of developers cited a lack of proper training and education on new AI tools as a significant challenge. These findings underscore the need for continued improvements in AI technology and better resources for developer education to fully realize AI's potential in software development.
Dataset Card for "MathDial-instructions-Correctness"
More Information needed
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sign language correctness discrimination (SLCD) dataset is collected for sign language teaching. Different from general sign language recognition datasets
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is Observational equivalence and compiler correctness. It features 7 columns including author, publication date, language, and book publisher.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset Card for tw-political-correctness-dpo
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Dataset Details
Dataset Description
Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]
Dataset Sources [optional]… See the full description on the dataset page: https://huggingface.co/datasets/lianghsun/tw-political-correctness-dpo.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains raw data from the pilot study samples used for the validity and reliability testing of the Environmental Enrichment Scale Questionnaire (EESQ) and its translated Malay version (EESQ-M).
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
The dataset contains both the robot's high-level tool center position (TCP) health data and controller-level components' information (i.e., joint positions, velocities, currents, temperatures, currents). The datasets can be used by users (e.g., software developers, data scientists) who work on robot health management (including accuracy) but have limited or no access to robots that can capture real data. The datasets can support the: - Development of robot health monitoring algorithms and tools - Research of technologies and tools to support robot monitoring, diagnostics, prognostics, and health management (collectively called PHM) - Validation and verification of the industrial PHM implementation. For example, the verification of a robot's TCP accuracy after the work cell has been reconfigured, or whenever a manufacturer wants to determine if the robot arm has experienced a degradation. For data collection, a trajectory is programmed for the Universal Robot (UR5) approaching and stopping at randomly-selected locations in its workspace. The robot moves along this preprogrammed trajectory during different conditions of temperature, payload, and speed. The TCP (x,y,z) of the robot are measured by a 7-D measurement system developed at NIST. Differences are calculated between the measured positions from the 7-D measurement system and the nominal positions calculated by the nominal robot kinematic parameters. The results are recorded within the dataset. Controller level sensing data are also collected from each joint (direct output from the controller of the UR5), to understand the influences of position degradation from temperature, payload, and speed. Controller-level data can be used for the root cause analysis of the robot performance degradation, by providing joint positions, velocities, currents, accelerations, torques, and temperatures. For example, the cold-start temperatures of the six joints were approximately 25 degrees Celsius. After two hours of operation, the joint temperatures increased to approximately 35 degrees Celsius. Control variables are listed in the header file in the data set (UR5TestResult_header.xlsx). If you'd like to comment on this data and/or offer recommendations on future datasets, please email guixiu.qiao@nist.gov.
Maintain accurate general revenue estimates to ensure collections between 97% and 103% of the final annual estimate every year through 2018.
The purpose of the project was to assess the validity of two Voice Stress Analysis (VSA) tools currently on the market: the Layered Voice Analysis (LVA) and the Computer Voice Stress Analyzer (CVSA). The methodology and sampling protocols for this study were derived from the pre-existing methodology and sampling techniques employed in the National Institute of Justice-funded Arrestee Drug Abuse Monitoring (ADAM) program that operated in Oklahoma County from 1998 to 2004. The researchers interviewed arrestees in the Oklahoma County jail about their recent illicit drug use during the months of February and March 2006. The VSA data collected using each of the software systems in this study were sent to certified examiners from CVSA and LVA for their analysis. After the completion of the interview, the subjects were asked to complete the data collection process by supplying urine specimens. Answers from the 319 respondents were compared to the results of a urinalysis test to determine the extent to which they were being deceptive. Then, their "actual deceptiveness" was compared to the extent to which deception was indicated by the VSA programs. The dataset contains (1) demographic information obtained from the official booking records, (2) responses to survey questions about recent drug use, (3) the results of a urinalysis test on five drugs, (4) variables recording "deception" or "no deception" on each of the drugs, and (5) decisions by novice and expert analysts regarding the indication of deception.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary: Database of confusion matrices retrieved from scientific literature. Suitable for research on the creation and explotation of the confusion matrix that remain as interestint topics, such as new tools, sampling design, indices derived from the matrix, proposals in testing statistical hypotheses and so on.Format: SQLite
This statistic presents data on the political correctness of the entertainment industry in the United States as of August 2017. During a survey, 37 percent of respondents stated that they found the entertainment industry to be too politically correct.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data for temporal validity change prediction, an NLP task that will be defined in an upcoming publication. The dataset consists of five columns.
The duration labels (context_only_tv, combined_tv) are class indices of the following class distribution:
[no time-sensitive information, less than one minute, 1-5 minutes, 5-15 minutes, 15-45 minutes, 45 minutes - 2 hours, 2-6 hours, more than 6 hours, 1-3 days, 3-7 days, 1-4 weeks, more than one month]
Different dataset splits are provided.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accuracy assessment is one of the most important components of both applied and research-oriented remote sensing projects. For mapped classes that have sharp and easily identified boundaries, a broad array of accuracy assessment methods has been developed. However, accuracy assessment is in many cases complicated by classes that have fuzzy, indeterminate, or gradational boundaries, a condition which is common in real landscapes; for example, the boundaries of wetlands, many soil map units, and tree crowns. In such circumstances, the conventional approach of treating all reference pixels as equally important, whether located on the map close to the boundary of a class, or in the class center, can lead to misleading results. We therefore propose an accuracy assessment approach that relies on center-weighting map segment area to calculate a variety of common classification metrics including overall accuracy, class user’s and producer’s accuracy, precision, recall, specificity, and the F1 score. This method offers an augmentation of traditional assessment methods, can be used for both binary and multiclass assessment, allows for the calculation of count- and area-based measures, and permits the user to define the impact of distance from map segment edges based on a distance weighting exponent and a saturation threshold distance, after which the weighting ceases to grow. The method is demonstrated using synthetic and real examples, highlighting its use when the accuracy of maps with inherently uncertain class boundaries is evaluated.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This directory contains the semantic coverage 24 of the twenty test suites T 1 ... T 20 of jTerminal for 25 total correctness with respect to M50; these files are 26 named TOT50T1 ... TOT50T20. This directory also 27 contains the graph that ranks the test suites T 1 ... T 20 28 by their semantic coverage; this graph is represented in 29 two formats, namely the list of arcs (ArcsTOT50) and 30 the graphical representation (GraphTOT50).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains the data and scripts to reproduce figures from 'Evaluating the accuracy of binary classifiers for geomorphic applications' published in Earth Surface Dynamics (Rossi, 2024).Figure 1 elevation data was downloaded from OpenTopography (2010 Channel Islands Lidar Collection, 2012; Anderson et al., 2012; Reed, 2006). GIS files for elevation data and transect locations are provided in the zipped geodatabase gis_fig1.gdb.zip.Figure 2 is based on the bedrock mapping at site P01 from Rossi et al. (2020). GIS files for 1-m slope, air photo mapping, its conversion to a truth raster, and the accuracy classification using a 38 degree slope threshold are provided in the zipped geodatabase gis_fig2.gdb.zip. Figures 3-7 are ultimately based on synthetic_feature_maps_main.py and synthetic_feature_maps_functions.py. The former uses the latter to plot example classified maps along with how accuracy scores vary as a function of feature fraction for a given set of input parameters set by the user. Results are saved as a .csv file. Because these master scripts are designed for one set of input parameters, I provide a number of other scripts below that aid in reproducing the figures shown in the manuscript.Figure 3a and 3c can be reproduced using generate_fig3.py directly using input parameters of l = 100, scl = 1, sflag = 2, and fmap = 0.5. This plots the 'match scene' scenario only. Note that there is code that is commented out that will let you plot the 'all feature' scenario as well.Figure 3b and 3d can be reproduced using generate_fig3.py directly using input parameters of l = 100, scl = 10, sflag = 2, and fmap = 0.5. This plots the 'match scene' scenario only. Note that there is code that is commented out that will let you plot the 'all feature' scenario as well.Figure 4 can be reproduced using generate_Fig4.py. It uses saved results from synthetic_feature_maps_main.py that are stored in the folder results_rand_only.Figure 5 can be reproduced using generate_Fig5.py. It uses saved results from synthetic_feature_maps_main.py that are stored in the folder results_syst_only.Figure 6 can be reproduced using generate_Fig6.py. It uses saved results from synthetic_feature_maps_main.py that are stored in the folder results_rand_plus_syst.Figure 7 can be reproduced using generate_Fig7.py. It uses saved results from synthetic_feature_maps_main.py that are stored in the folders results_rand_only, results_syst_only, and results_rand_plus_syst.Figure 8 is conceptual. Figs. 8a-b were drawn in Adobe Illustrator. The plot shown in Fig. 8c can be reproduced using generate_Fig8c.py and requires the associated file fig8_examples.txt.Figure 9 is conceptual. Fig. 9a was drawn in Adobe Illustrator. The plot shown in Fig. 9b can be reproduced using generate_Fig9b.py. Because it is not using saved results and runs the 'systematic error' scenario from scratch using synthetic_feature_maps_functions.py, this script will take a bit of time to run.Table 1 uses the data from the classified map in Fig 2a and can be directly derived from eqs. 1-7.Table 2 requires merging two scenes with different feature fractions to produce and average feature fraction of 0.50. Each cell in the table can be calculated using generate_Table2_contents.py. It uses saved results from synthetic_feature_maps_main.py that are stored in the folders results_rand_only, results_syst_only, and results_rand_plus_syst.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Abstract: In our report, we described a case study where we evaluate our approach for correctness quantification using Palladio and KeY. Here, we publish the corresponding Palladio Component Model files as well as the generated source code, in order to make them publicly available. TechnicalRemarks: # README: Case Study (Age of Maturity) This archive contains the Palladio artifacts of the case study "AgeOfMaturity" of the Technical-Report "Model-driven Quantification of Correctness with Palladio and KeY" (DOI: 10.5445/IR/1000128855). The structure of this archive is as follows: The folder "PalladioModels" contains the Palladio Component Model Eclipse Project of the "AgeOfMaturity" use case The folder "DependencySolverResults" contains output files with the results of the solved "AgeOfMaturity" case study as described in the report The folder "GeneratedJavaCode" contains Java-Code generated with the PCM2Java Project