Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
To achieve accurate assignment of peptide sequences to observed fragmentation spectra, a shotgun proteomics database search tool must make good use of the very high-resolution information produced by state-of-the-art mass spectrometers. However, making use of this information while also ensuring that the search engine’s scores are well calibrated, that is, that the score assigned to one spectrum can be meaningfully compared to the score assigned to a different spectrum, has proven to be challenging. Here we describe a database search score function, the “residue evidence” (res-ev) score, that achieves both of these goals simultaneously. We also demonstrate how to combine calibrated res-ev scores with calibrated XCorr scores to produce a “combined p value” score function. We provide a benchmark consisting of four mass spectrometry data sets, which we use to compare the combined p value to the score functions used by several existing search engines. Our results suggest that the combined p value achieves state-of-the-art performance, generally outperforming MS Amanda and Morpheus and performing comparably to MS-GF+. The res-ev and combined p-value score functions are freely available as part of the Tide search engine in the Crux mass spectrometry toolkit (http://crux.ms).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Instructions are available here:
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The Crux tandem mass spectrometry data analysis toolkit provides a collection of algorithms for analyzing bottom-up proteomics tandem mass spectrometry data. Many publications have described various individual components of Crux, but a comprehensive summary has not been published since 2014. The goal of this work is to summarize the functionality of Crux, focusing on developments since 2014. We begin with empirical results demonstrating our recently implemented speedups to the Tide search engine. Other new features include a new score function in Tide, two new confidence estimation procedures, as well as three new tools: Param-medic for estimating search parameters directly from mass spectrometry data, Kojak for searching cross-linked mass spectra, and DIAmeter for searching data independent acquisition data against a sequence database.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We have three datasets that display visualized SVCOMP witnesses generated with the help of the Witness Visualizer tool. Each dataset comprises two directories: witnesses, which contains the original witnesses provided by SVCOMP tools, and visualization, which contains our visual representations of the respective witnesses in HTML format. The visualization file name contains the prefix error_trace-, for example, error_trace-witness.2ls.html corresponds to a witness named witness.2ls.graphml.
This dataset includes a single random witness for each SVCOMP tool, accompanied by its corresponding visualization. The visualizations showcase the various witness elements such as function calls, conditions, assumptions, thread specifics, and other operations. Cells marked with +/- indicate that some elements were present in the error trace, but not all of them. All witnesses are presented in the table below:
Witness SV-COMP Tool Function calls Threads Assumptions Conditions Link to sources
witness.2ls.graphml
-
+
witness.aprove.graphml
-
-
+ +
witness.brick.graphml
-
+
witness.bubaak.graphml
-
+
witness.cbmc.graphml
+
+
witness.cpa-bam-bnb.graphml CPA-BAM-BnB
+ + +
witness.cpa-bam-smg.graphml CPA-BAM-SMG
+ + +
witness.cpalockator.graphml CPALockator + + + + +
witness.cpachecker.graphml CPAChecker + + + + +
witness.crux.graphml
-
+
witness.cseq.graphml Cseq + +
+
witness.dartagnan.graphml
-
+
witness.deagle.graphml
+
+
-
DIVINE (until 2022) empty
-
EBF empty
witness.esbmc-incr.graphml
+
+
witness.esbmc-kind.graphml
+
+
-
Frama-C-SV empty
witness.gazer-theta.graphml Gazer-Theta
wrong path
witness.gdart.graphml
-
+
-
Goblint empty
witness.graves_cpa.graphml Graves-CPA + + + + +
witness.graves_par.graphml Graves-Par + + + + +
-
Infer empty
witness.korn.graphml
-
+
witness.lart.graphml
-
+
witness.lazy-cseq.graphml Lazy-CSeq + + + + +
witness.lfchecker.graphml
+
+
-
Locksmith empty
-
Mopsa empty
witness.pesco_cpa.graphml PeSCo-CPA + + + + +
witness.pichecker.graphml PIChecker
+ + +
witness.pinaka.graphml
-
+
witness.predator.graphml
-
-
-
+
-
SESL (2022) empty
witness.smack.graphml
-
+
witness.symbiotic.graphml
+
+
witness.theta.graphml Theta different format
witness.uatomozer.graphml UAutomizer +/- + + + +
witness.ucutter.graphml UgemCutter +/- + + + +
witness.ukojak.graphml UKojak
+ + +
witness.utaipan.graphml UTaipan +/- + + + +
witness.veriabs.graphml
-
wrong path
witness.veriabsl.graphml VeriAbsL
+ + wrong path
witness.verifuzz.graphml
-
+
witness.verioover.graphml
-
+
This dataset comprises a selected witness for each SVCOMP property (ReachSafety, MemSafety, Termination, NoOverflow, ConcurrencySafety). The witnesses are presented in the following table:
Witness SV-COMP Tool Property Mandatory elements Description
witness.smg_memory.graphml CPA-BAM-SMG MemSafety Assumptions / conditions, function calls There is a double free operation. Employing function calls append aids in comprehending the structure of the list, while assumptions reveal which branch was chosen.
witness.graves_overflow.graphml Graves-CPA NoOverflow Assumptions / conditions The witness showcases an explicit (-2147483648, which represents the minimal value for the int type), which has the potential to cause overflow in specific program.
witness.cpachecker_termination.graphml CPAChecker NoTermination Assumptions / conditions There is a condition leading to an infinite loop.
witness.cpachecker_unreach.graphml CPAChecker ReachSafety Function calls The error trace indicates a potential scenario where a mutex was unlockedwithout the corresponding mutex_unlock operation.
witness.cpachecker_conc.graphml CPAChecker ConcurrencySafety Function calls, thread operations The error trace illustrates the creation of threads and highlights the assignments made within each thread that ultimately resulted in the violation of the property.
This dataset contains witnesses for a known bug from SVCOMP (linux-3.14--drivers--usb--misc--adutux.ko.cil.i) involving a data race on dev->udev, where concurrent writes occur without corresponding locks. Only two tools were able to solve the corresponding verification task: ESBMC-kind and CPALockator. The ESBMC error trace (witness.esbmc_2020.graphml) includes only thread specifics and assumptions, while the CPALockator witness (witness.lockator.graphml) comprises all witness elements and is presented in a human-readable format.
This section presents a comparison between witness thoroughness and the actual validation rate for each property. We considered all tools that participated in the respective category and generated at least 10 error traces, then calculated the validation rate. This comparison demonstrates how effectively thoroughness can approximate the validation rate. The following tables provide details for each property, with the relevant elements used to calculate thoroughness highlighted:
MemSafety property:
SV-COMP Tool Function calls Threads Assumptions Conditions Thoroughness Error traces Validation rate
Bubaak 0 0 1 0 33.33 64 67.19
CBMC 0 1 1 0 33.33 27 11.11
CPA-BAM-SMG 1 0 1 1 100 46 78.26
CPAChecker 1 1 1 1 100 37 67.57
ESBMC-kind 0 1 1 0 33.33 25 20
Graves-CPA 1 1 1 1 100 44 56.82
Graves-Par 1 1 1 1 100 18 77.78
PeSCo-CPA 1 1 1 1 100 37 67.57
NoOverflow property:
SV-COMP Tool Function calls Threads Assumptions Conditions Thoroughness Error traces Validation rate
2LS 0 0 1 0 100 2071 95.7
Bubaak 0 0 1 0 100 2233 94.67
CBMC 0 1 1 0 100 3296 62.14
CPAChecker 1 1 1 1 100 196 100
Crux 0 0 1 0 100 222 95.05
ESBMC-kind 0 1 1 0 100 3296 66.69
Frama-C-SV 0 0 0 0 0 676 0
Graves-Par 1 1 1 1 100 750 2
Infer 0 0 0 0 0 583 0
Pinaka 0 0 1 0 100 2232 100
Symbiotic 0 1 1 0 100 1418 100
UAutomizer 0.5 1 1 1 100 2222 100
UKojak 0.5 0 1 1 100 168 100
UTaipan 0.5 1 1 1 100 0 100
VeriFuzz 0 0 1 0 100 185 90.81
NoTermination property:
SV-COMP Tool Function calls Threads Assumptions Conditions Thoroughness Error traces Validation rate
2LS 0 0 1 0 50 663 69.08
Bubaak 0 0 1 0 50 578 34.78
CPAChecker 1 1 1 1 100 501 97.01
Symbiotic 0 1 1 0 50 591 52.96
UAutomizer 0.5 1 1 1 100 512 98.24
VeriFuzz 0 0 1 0 50 492 71.34
ReachSafety property:
SV-COMP Tool Function calls Threads Assumptions Conditions Thoroughness Error traces Validation rate
Bubaak 0 0 1 0 0 24 54.17
CBMC 0 1 1 0 0 392 1.28
CPA-BAM-BnB 1 0 1 1 100 69 85.51
CPA-BAM-SMG 1 0 1 1 100 67 85.07
CPAChecker 1 1 1 1 100 45 88.89
Crux 0 0 1 0 0 1572 0.13
ESBMC-kind 0 1 1 0 0 64 21.88
Graves-CPA 1 1 1 1 100 66 87.88
Graves-Par 1 1 1 1 100 24 58.33
PeSCo-CPA 1 1 1 1 100 63 85.71
ConcurrencySafety property:
SV-COMP Tool Function calls Threads Assumptions Conditions Thoroughness Error traces Validation rate
CBMC 0 1 1 0 50 277 87
CPA-Lockator 1 1 1 1 100 83 26.51
CPAChecker 1 1 1 1 100 257 100
Cseq 1 1 1 0 100 277 94.58
Dartagnan 0 1 0 0 50 281 92.17
Deagle 0 1 1 0 50 280 96.07
DIVINE 0 0 0 0 0 230 80.87
EBF 0 0 0 0 0 282 89.01
ESBMC-incr 0 1 1 0 50 68 79.41
ESBMC-kind 0 1 1 0 50 263 89.73
Graves-CPA 1 1 1 1 100 261 99.23
Graves-Par 1 1 1 1 100 28 100
Infer 0 0 0 0 0 634 0
Lazy-CSeq 1 1 1 1 100 274 94.89
LF-checker 0 1 1 0 50 286 85.31
PeSCo-CPA 1 1 1 1 100 256 100
PIChecker 1 0 1 1 50 269 98.14
Symbiotic 0 1 1 0 50 110 92.73
UAutomizer 0.5 1 1 1 75 297 94.95
UgemCutter 0.5 1 1 1 75 283 96.47
UTaipan 0.5 1 1 1 75 293 96.25
This section presents the overall difference (i.e., the sum of differences between witness thoroughness and validation rates for each tool) when thoroughness is calculated based on all possible combinations of witness elements (assumptions, conditions, thread specifics, and function calls). The set of witnesses is the same as in the previous section. The following tables provide details for each property, with the minimum difference highlighted:
MemSafety property:
Combination Overall difference
Function calls 250.3
Thread specifics 444.6
Assumptions 353.7
Conditions 250.3
Function calls, Thread specifics 294.6
Assumptions, Function calls 238.08
Conditions, Function calls 250.3
Assumptions, Thread specifics 344.6
Conditions, Thread specifics 294.6
Assumptions, Conditions 238.08
Assumptions, Function calls, Thread specifics 277.94
Conditions, Function calls, Thread specifics 244.59
Assumptions, Conditions, Function calls 221.41
Assumptions, Conditions, Thread specifics 277.94
Assumptions, Conditions, Function calls, Thread specifics 244.6
NoOverflow property:
Combination Overall difference
Function calls 953.06
Thread specifics 745.4
Assumptions 192.94
Conditions 803.06
Function calls, Thread specifics 778.06
Assumptions, Function calls 478.06
Conditions, Function calls 878.06
Assumptions, Thread specifics 445.4
Conditions, Thread specifics 703.06
Assumptions, Conditions 403.06
Assumptions, Function calls, Thread specifics 528.8
Conditions, Function calls, Thread specifics 786.41
Assumptions, Conditions, Function calls 586.43
Assumptions, Conditions, Thread specifics 478.79
Assumptions, Conditions, Function calls, Thread specifics 590.56
NoTermination property:
Combination Overall
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
To achieve accurate assignment of peptide sequences to observed fragmentation spectra, a shotgun proteomics database search tool must make good use of the very high-resolution information produced by state-of-the-art mass spectrometers. However, making use of this information while also ensuring that the search engine’s scores are well calibrated, that is, that the score assigned to one spectrum can be meaningfully compared to the score assigned to a different spectrum, has proven to be challenging. Here we describe a database search score function, the “residue evidence” (res-ev) score, that achieves both of these goals simultaneously. We also demonstrate how to combine calibrated res-ev scores with calibrated XCorr scores to produce a “combined p value” score function. We provide a benchmark consisting of four mass spectrometry data sets, which we use to compare the combined p value to the score functions used by several existing search engines. Our results suggest that the combined p value achieves state-of-the-art performance, generally outperforming MS Amanda and Morpheus and performing comparably to MS-GF+. The res-ev and combined p-value score functions are freely available as part of the Tide search engine in the Crux mass spectrometry toolkit (http://crux.ms).