100+ datasets found

t
Quantification of correctness with palladio and key: case study data -...
service.tib.eu
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Quantification of correctness with palladio and key: case study data - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1274
Explore at:
Dataset updated
Nov 28, 2024
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Abstract: In our report, we described a case study where we evaluate our approach for correctness quantification using Palladio and KeY. Here, we publish the corresponding Palladio Component Model files as well as the generated source code, in order to make them publicly available. TechnicalRemarks: # README: Case Study (Age of Maturity) This archive contains the Palladio artifacts of the case study "AgeOfMaturity" of the Technical-Report "Model-driven Quantification of Correctness with Palladio and KeY" (DOI: 10.5445/IR/1000128855). The structure of this archive is as follows: The folder "PalladioModels" contains the Palladio Component Model Eclipse Project of the "AgeOfMaturity" use case The folder "DependencySolverResults" contains output files with the results of the solved "AgeOfMaturity" case study as described in the report The folder "GeneratedJavaCode" contains Java-Code generated with the PCM2Java Project

helpsteer-correctness

huggingface.co

Updated Aug 18, 2009

Facebook

Twitter

Click to copy link

Link copied

Cite

Yongyuan Liang (2009). helpsteer-correctness [Dataset]. https://huggingface.co/datasets/cheryyunl/helpsteer-correctness

Explore at:

Dataset updated

Aug 18, 2009

Authors

Yongyuan Liang

Description

Helpsteer-correctness

This dataset is derived from NVIDIA's HelpSteer dataset, processed specifically for preference learning on the correctness dimension.

- Train split: 27417 examples
- Test split: 1416 examples

## Format

Each example contains the following fields:
- `prompt`: Question with "Human:" prefix and "Assistant:" suffix
- `chosen`: The response with higher correctness score
- `rejected`: The response with lower correctness score
-… See the full description on the dataset page: https://huggingface.co/datasets/cheryyunl/helpsteer-correctness.

Z
Classifying the Correctness of Generated White-Box Tests: An Exploratory...
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Honfi, Dávid (2020). Classifying the Correctness of Generated White-Box Tests: An Exploratory Study [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_804001
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Honfi, Dávid
Micskei, Zoltán
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
White-box test generator tools rely only on the code under test to select test inputs, and capture the implementation's output as assertions. If there is a fault in the implementation, it could get encoded in the generated tests. Tool evaluations usually measure fault-detection capability using the number of such fault-encoding tests. However, these faults are only detected, if the developer can recognize that the encoded behavior is faulty. We designed an exploratory study to investigate how developers perform in classifying generated white-box test as faulty or correct. We carried out the study in a laboratory setting with 54 graduate students. The tests were generated for two open-source projects with the help of the IntelliTest tool. The performance of the participants were analyzed using binary classification metrics and by coding their observed activities. The results showed that participants incorrectly classified a large number of both fault-encoding and correct tests (with median misclassification rate 33% and 25% respectively). Thus the real fault-detection capability of test generators could be much lower than typically reported, and we suggest to take this human factor into account when evaluating generated white-box tests.

This material contains the dataset and the recorded videos of the study.
Z
MGF datasets necessary to test correctness of library parser
data.niaid.nih.gov
zenodo.org
Updated Aug 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luca Cappelletti (2023). MGF datasets necessary to test correctness of library parser [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8275684
Explore at:
Dataset updated
Aug 23, 2023
Dataset authored and provided by
Luca Cappelletti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Some MGF datasets used for reference for testing porpouses.
Transaction Accuracy Review
catalog.data.gov
data.amerigeoss.org
Updated May 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Security Administration (2025). Transaction Accuracy Review [Dataset]. https://catalog.data.gov/dataset/transaction-accuracy-review
Explore at:
Dataset updated
May 22, 2025
Dataset provided by
Social Security Administrationhttp://www.ssa.gov/
Description
Stores information for the Transaction Accuracy Review (TAR) which measures the adjudicative accuracy of the approximately 3.9 million Title II claims processed each year.
Accuracy of AI tools in the development workflow globally 2024
statista.com
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Accuracy of AI tools in the development workflow globally 2024 [Dataset]. https://www.statista.com/statistics/1440349/ai-accuracy-in-development-workflow-globally/
Explore at:
Dataset updated
Mar 27, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 19, 2024 - Jun 20, 2024
Area covered
Worldwide
Description
Software developers are increasingly encountering artificial intelligence (AI) tools in their workflows, but opinions on their accuracy remain divided. According to a 2024 Stack Overflow developer survey, 43 percent of programmers either highly or somewhat trust the output from AI tools, while 30 percent express distrust. This split in trust levels highlights the ongoing debate about AI's role and reliability in software development. Productivity gains drive AI adoption Despite mixed feelings about accuracy, AI tools are making significant impacts on developer productivity. The same survey found that over 80 percent of developers cited improved productivity as the most important benefit of using AI tools, a substantial increase from 33 percent the previous year. This productivity boost is further evidenced by a study showing that developers using AI co-pilots completed tasks 56 percent faster than those without AI assistance. Challenges in AI implementation While AI shows promise in handling complex tasks, with nearly one in three developers reporting its effectiveness in this area, challenges remain. Distrust of AI-generated output was identified as the greatest obstacle to AI adoption in development workflows by two-thirds of developers worldwide. Additionally, 30 percent of developers cited a lack of proper training and education on new AI tools as a significant challenge. These findings underscore the need for continued improvements in AI technology and better resources for developer education to fully realize AI's potential in software development.
h
MathDial-instructions-Correctness
huggingface.co
Updated Apr 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan Carlos Piñeros (2024). MathDial-instructions-Correctness [Dataset]. https://huggingface.co/datasets/juancopi81/MathDial-instructions-Correctness
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 13, 2024
Authors
Juan Carlos Piñeros
Description
Dataset Card for "MathDial-instructions-Correctness"

More Information needed
i
Sign language correctness discrimination (SLCD) dataset
ieee-dataport.org
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuying Yang (2025). Sign language correctness discrimination (SLCD) dataset [Dataset]. https://ieee-dataport.org/documents/sign-language-correctness-discrimination-slcd-dataset
Explore at:
Dataset updated
Jun 15, 2025
Authors
Shuying Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sign language correctness discrimination (SLCD) dataset is collected for sign language teaching. Different from general sign language recognition datasets
w
Dataset of books called Observational equivalence and compiler correctness
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Observational equivalence and compiler correctness [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Observational+equivalence+and+compiler+correctness
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is Observational equivalence and compiler correctness. It features 7 columns including author, publication date, language, and book publisher.
h
tw-political-correctness-dpo
huggingface.co
Updated Mar 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huang Liang Hsun (2025). tw-political-correctness-dpo [Dataset]. https://huggingface.co/datasets/lianghsun/tw-political-correctness-dpo
Explore at:
Dataset updated
Mar 24, 2025
Authors
Huang Liang Hsun
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for tw-political-correctness-dpo

This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

Dataset Details Dataset Description

Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]

Dataset Sources [optional]… See the full description on the dataset page: https://huggingface.co/datasets/lianghsun/tw-political-correctness-dpo.
m
EESQ and EESQ-M reliability and validity raw data
data.mendeley.com
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nik nasihah nik ramli (2024). EESQ and EESQ-M reliability and validity raw data [Dataset]. http://doi.org/10.17632/ct74hk8wbw.1
Explore at:
Unique identifier
https://doi.org/10.17632/ct74hk8wbw.1
Dataset updated
Dec 2, 2024
Authors
nik nasihah nik ramli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains raw data from the pilot study samples used for the validity and reliability testing of the Environmental Enrichment Scale Questionnaire (EESQ) and its translated Malay version (EESQ-M).
Degradation Measurement of Robot Arm Position Accuracy
data.nist.gov
Updated Sep 7, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Helen Qiao (2018). Degradation Measurement of Robot Arm Position Accuracy [Dataset]. http://doi.org/10.18434/M31962
Explore at:
Unique identifier
https://doi.org/10.18434/M31962
Dataset updated
Sep 7, 2018
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Authors
Helen Qiao
License
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
Description
The dataset contains both the robot's high-level tool center position (TCP) health data and controller-level components' information (i.e., joint positions, velocities, currents, temperatures, currents). The datasets can be used by users (e.g., software developers, data scientists) who work on robot health management (including accuracy) but have limited or no access to robots that can capture real data. The datasets can support the: - Development of robot health monitoring algorithms and tools - Research of technologies and tools to support robot monitoring, diagnostics, prognostics, and health management (collectively called PHM) - Validation and verification of the industrial PHM implementation. For example, the verification of a robot's TCP accuracy after the work cell has been reconfigured, or whenever a manufacturer wants to determine if the robot arm has experienced a degradation. For data collection, a trajectory is programmed for the Universal Robot (UR5) approaching and stopping at randomly-selected locations in its workspace. The robot moves along this preprogrammed trajectory during different conditions of temperature, payload, and speed. The TCP (x,y,z) of the robot are measured by a 7-D measurement system developed at NIST. Differences are calculated between the measured positions from the 7-D measurement system and the nominal positions calculated by the nominal robot kinematic parameters. The results are recorded within the dataset. Controller level sensing data are also collected from each joint (direct output from the controller of the UR5), to understand the influences of position degradation from temperature, payload, and speed. Controller-level data can be used for the root cause analysis of the robot performance degradation, by providing joint positions, velocities, currents, accelerations, torques, and temperatures. For example, the cold-start temperatures of the six joints were approximately 25 degrees Celsius. After two hours of operation, the joint temperatures increased to approximately 35 degrees Celsius. Control variables are listed in the header file in the data set (UR5TestResult_header.xlsx). If you'd like to comment on this data and/or offer recommendations on future datasets, please email guixiu.qiao@nist.gov.
d
General Revenue Estimate Accuracy
catalog.data.gov
data.ok.gov
+1more
Updated Nov 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OKStateStat (2024). General Revenue Estimate Accuracy [Dataset]. https://catalog.data.gov/dataset/general-revenue-estimate-accuracy
Explore at:
Dataset updated
Nov 22, 2024
Dataset provided by
OKStateStat
Description
Maintain accurate general revenue estimates to ensure collections between 97% and 103% of the final annual estimate every year through 2018.
Data from: Assessing the Validity of Voice Stress Analysis (VSA) Tools in a...
catalog.data.gov
icpsr.umich.edu
Updated Mar 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). Assessing the Validity of Voice Stress Analysis (VSA) Tools in a Jail Setting in Oklahoma City, Oklahoma, 2006 [Dataset]. https://catalog.data.gov/dataset/assessing-the-validity-of-voice-stress-analysis-vsa-tools-in-a-jail-setting-in-oklahoma-ci-cb7c3
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
National Institute of Justicehttp://nij.ojp.gov/
Area covered
Oklahoma, Oklahoma City
Description
The purpose of the project was to assess the validity of two Voice Stress Analysis (VSA) tools currently on the market: the Layered Voice Analysis (LVA) and the Computer Voice Stress Analyzer (CVSA). The methodology and sampling protocols for this study were derived from the pre-existing methodology and sampling techniques employed in the National Institute of Justice-funded Arrestee Drug Abuse Monitoring (ADAM) program that operated in Oklahoma County from 1998 to 2004. The researchers interviewed arrestees in the Oklahoma County jail about their recent illicit drug use during the months of February and March 2006. The VSA data collected using each of the software systems in this study were sent to certified examiners from CVSA and LVA for their analysis. After the completion of the interview, the subjects were asked to complete the data collection process by supplying urine specimens. Answers from the 319 respondents were compared to the results of a urinalysis test to determine the extent to which they were being deceptive. Then, their "actual deceptiveness" was compared to the extent to which deception was indicated by the VSA programs. The dataset contains (1) demographic information obtained from the official booking records, (2) responses to survey questions about recent drug use, (3) the results of a urinalysis test on five drugs, (4) variables recording "deception" or "no deception" on each of the drugs, and (5) decisions by novice and expert analysts regarding the indication of deception.
f
Data from: Database of confusion matrices (SQLite)
figshare.com
investigacion.ujaen.es
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francisco Javier Ariza López; José Luis García Balboa; María Virtudes Alba Fernández; José Rodríguez Avi (2023). Database of confusion matrices (SQLite) [Dataset]. http://doi.org/10.6084/m9.figshare.11417040.v2
Explore at:
application/x-sqlite3Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11417040.v2
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Francisco Javier Ariza López; José Luis García Balboa; María Virtudes Alba Fernández; José Rodríguez Avi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary: Database of confusion matrices retrieved from scientific literature. Suitable for research on the creation and explotation of the confusion matrix that remain as interestint topics, such as new tools, sampling design, indices derived from the matrix, proposals in testing statistical hypotheses and so on.Format: SQLite
Political correctness of the entertainment industry in the U.S. 2017
statista.com
Updated Dec 6, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2018). Political correctness of the entertainment industry in the U.S. 2017 [Dataset]. https://www.statista.com/statistics/720622/entertainment-industry-political-correctness-usa/
Explore at:
Dataset updated
Dec 6, 2018
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Aug 17, 2017 - Aug 19, 2017
Area covered
United States
Description
This statistic presents data on the political correctness of the entertainment industry in the United States as of August 2017. During a survey, 37 percent of respondents stated that they found the entertainment industry to be too politically correct.
Data from: Temporal Validity Change Prediction - Dataset
zenodo.org
data.niaid.nih.gov
csv
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georg Wenzel; Georg Wenzel (2025). Temporal Validity Change Prediction - Dataset [Dataset]. http://doi.org/10.5281/zenodo.8340858
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8340858
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Georg Wenzel; Georg Wenzel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data for temporal validity change prediction, an NLP task that will be defined in an upcoming publication. The dataset consists of five columns.

target - A Tweet ID. This column must be manually rehydrated via the Twitter API to obtain the tweet text.

follow_up - A synthetic follow-up tweet that semantically relates to the target tweet.

context_only_tv - The expected temporal validity duration of the target tweet, when read in isolation.

combined_tv - The expected temporal validity duration of the target tweet, when read together with the follow-up tweet.

change - The TVCP task label, i.e., whether the temporal validity duration of the target tweet is decreased, unchanged (neutral), or increased by the information in the follow-up tweet.

The duration labels (context_only_tv, combined_tv) are class indices of the following class distribution:
[no time-sensitive information, less than one minute, 1-5 minutes, 5-15 minutes, 15-45 minutes, 45 minutes - 2 hours, 2-6 hours, more than 6 hours, 1-3 days, 3-7 days, 1-4 weeks, more than one month]

Different dataset splits are provided.

"dataset.csv" contains the full dataset.

"train.csv", "val.csv", "test.csv" contain an 80-10-10 train-val-test split.

"train[0-4].csv" and "test[0-4].csv" respectively contain training and test data for one of 5 folds for 5-fold cross-validation. The train file contains 80% of the data, while the test file contains 20%. To replicate the original experiments, the train file should be sorted by the preprocessed target tweet text, then the first 12.5% of target tweets should be sampled to generate validation data, leading to a 70-10-20 train-val-test split.
Thematic Classification Accuracy Assessment with Inherently Uncertain...
ckan.americaview.org
Updated Sep 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.americaview.org (2021). Thematic Classification Accuracy Assessment with Inherently Uncertain Boundaries: An Argument for Center-Weighted Accuracy Assessment Metrics - Datasets - AmericaView - CKAN [Dataset]. https://ckan.americaview.org/dataset/thematic-classification-accuracy-assessment-with-inherently-uncertain-boundaries
Explore at:
Dataset updated
Sep 19, 2021
Dataset provided by
CKANhttps://ckan.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Accuracy assessment is one of the most important components of both applied and research-oriented remote sensing projects. For mapped classes that have sharp and easily identified boundaries, a broad array of accuracy assessment methods has been developed. However, accuracy assessment is in many cases complicated by classes that have fuzzy, indeterminate, or gradational boundaries, a condition which is common in real landscapes; for example, the boundaries of wetlands, many soil map units, and tree crowns. In such circumstances, the conventional approach of treating all reference pixels as equally important, whether located on the map close to the boundary of a class, or in the class center, can lead to misleading results. We therefore propose an accuracy assessment approach that relies on center-weighting map segment area to calculate a variety of common classification metrics including overall accuracy, class user’s and producer’s accuracy, precision, recall, specificity, and the F1 score. This method offers an augmentation of traditional assessment methods, can be used for both binary and multiclass assessment, allows for the calculation of count- and area-based measures, and permits the user to define the impact of distance from map segment edges based on a distance weighting exponent and a saturation threshold distance, after which the weighting ceases to grow. The method is demonstrated using synthetic and real examples, highlighting its use when the accuracy of maps with inherently uncertain class boundaries is evaluated.
TOT50
figshare.com
txt
Updated Feb 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
samia Alblwi (2023). TOT50 [Dataset]. http://doi.org/10.6084/m9.figshare.21964850.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21964850.v4
Dataset updated
Feb 8, 2023
Dataset provided by
figshare
Authors
samia Alblwi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This directory contains the semantic coverage 24 of the twenty test suites T 1 ... T 20 of jTerminal for 25 total correctness with respect to M50; these files are 26 named TOT50T1 ... TOT50T20. This directory also 27 contains the graph that ranks the test suites T 1 ... T 20 28 by their semantic coverage; this graph is represented in 29 two formats, namely the list of arcs (ArcsTOT50) and 30 the graphical representation (GraphTOT50).
f
Evaluating the accuracy of binary classifiers for geomorphic applications by...
figshare.com
zip
Updated Apr 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Rossi (2024). Evaluating the accuracy of binary classifiers for geomorphic applications by Rossi (2024) - Accuracy assessment software and figure generation [Dataset]. http://doi.org/10.6084/m9.figshare.23796024.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23796024.v1
Dataset updated
Apr 29, 2024
Dataset provided by
figshare
Authors
Matthew Rossi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains the data and scripts to reproduce figures from 'Evaluating the accuracy of binary classifiers for geomorphic applications' published in Earth Surface Dynamics (Rossi, 2024).Figure 1 elevation data was downloaded from OpenTopography (2010 Channel Islands Lidar Collection, 2012; Anderson et al., 2012; Reed, 2006). GIS files for elevation data and transect locations are provided in the zipped geodatabase gis_fig1.gdb.zip.Figure 2 is based on the bedrock mapping at site P01 from Rossi et al. (2020). GIS files for 1-m slope, air photo mapping, its conversion to a truth raster, and the accuracy classification using a 38 degree slope threshold are provided in the zipped geodatabase gis_fig2.gdb.zip. Figures 3-7 are ultimately based on synthetic_feature_maps_main.py and synthetic_feature_maps_functions.py. The former uses the latter to plot example classified maps along with how accuracy scores vary as a function of feature fraction for a given set of input parameters set by the user. Results are saved as a .csv file. Because these master scripts are designed for one set of input parameters, I provide a number of other scripts below that aid in reproducing the figures shown in the manuscript.Figure 3a and 3c can be reproduced using generate_fig3.py directly using input parameters of l = 100, scl = 1, sflag = 2, and fmap = 0.5. This plots the 'match scene' scenario only. Note that there is code that is commented out that will let you plot the 'all feature' scenario as well.Figure 3b and 3d can be reproduced using generate_fig3.py directly using input parameters of l = 100, scl = 10, sflag = 2, and fmap = 0.5. This plots the 'match scene' scenario only. Note that there is code that is commented out that will let you plot the 'all feature' scenario as well.Figure 4 can be reproduced using generate_Fig4.py. It uses saved results from synthetic_feature_maps_main.py that are stored in the folder results_rand_only.Figure 5 can be reproduced using generate_Fig5.py. It uses saved results from synthetic_feature_maps_main.py that are stored in the folder results_syst_only.Figure 6 can be reproduced using generate_Fig6.py. It uses saved results from synthetic_feature_maps_main.py that are stored in the folder results_rand_plus_syst.Figure 7 can be reproduced using generate_Fig7.py. It uses saved results from synthetic_feature_maps_main.py that are stored in the folders results_rand_only, results_syst_only, and results_rand_plus_syst.Figure 8 is conceptual. Figs. 8a-b were drawn in Adobe Illustrator. The plot shown in Fig. 8c can be reproduced using generate_Fig8c.py and requires the associated file fig8_examples.txt.Figure 9 is conceptual. Fig. 9a was drawn in Adobe Illustrator. The plot shown in Fig. 9b can be reproduced using generate_Fig9b.py. Because it is not using saved results and runs the 'systematic error' scenario from scratch using synthetic_feature_maps_functions.py, this script will take a bit of time to run.Table 1 uses the data from the classified map in Fig 2a and can be directly derived from eqs. 1-7.Table 2 requires merging two scenes with different feature fractions to produce and average feature fraction of 0.50. Each cell in the table can be calculated using generate_Table2_contents.py. It uses saved results from synthetic_feature_maps_main.py that are stored in the folders results_rand_only, results_syst_only, and results_rand_plus_syst.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). Quantification of correctness with palladio and key: case study data - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1274

Quantification of correctness with palladio and key: case study data - Vdataset - LDM

Explore at:

Dataset updated

Nov 28, 2024

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Abstract: In our report, we described a case study where we evaluate our approach for correctness quantification using Palladio and KeY. Here, we publish the corresponding Palladio Component Model files as well as the generated source code, in order to make them publicly available. TechnicalRemarks: # README: Case Study (Age of Maturity) This archive contains the Palladio artifacts of the case study "AgeOfMaturity" of the Technical-Report "Model-driven Quantification of Correctness with Palladio and KeY" (DOI: 10.5445/IR/1000128855). The structure of this archive is as follows: The folder "PalladioModels" contains the Palladio Component Model Eclipse Project of the "AgeOfMaturity" use case The folder "DependencySolverResults" contains output files with the results of the solved "AgeOfMaturity" case study as described in the report The folder "GeneratedJavaCode" contains Java-Code generated with the PCM2Java Project

Clear search

Close search

Google apps

Main menu

Quantification of correctness with palladio and key: case study data -...

helpsteer-correctness

Classifying the Correctness of Generated White-Box Tests: An Exploratory...

MGF datasets necessary to test correctness of library parser

Transaction Accuracy Review

Accuracy of AI tools in the development workflow globally 2024

MathDial-instructions-Correctness

Sign language correctness discrimination (SLCD) dataset

Dataset of books called Observational equivalence and compiler correctness

tw-political-correctness-dpo

EESQ and EESQ-M reliability and validity raw data

Degradation Measurement of Robot Arm Position Accuracy

General Revenue Estimate Accuracy

Data from: Assessing the Validity of Voice Stress Analysis (VSA) Tools in a...

Data from: Database of confusion matrices (SQLite)

Political correctness of the entertainment industry in the U.S. 2017

Data from: Temporal Validity Change Prediction - Dataset

Thematic Classification Accuracy Assessment with Inherently Uncertain...

TOT50

Evaluating the accuracy of binary classifiers for geomorphic applications by...

Quantification of correctness with palladio and key: case study data - Vdataset - LDMSee More Versions

Quantification of correctness with palladio and key: case study data - Vdataset - LDM