V1.1 Artif actsAvail able Software Verification Witnesses 2.0 Paulína Ayaziová 1 , Dirk Beyer 2 , Marian Lingsch-Rosenfeld 2 , Martin Spiessl 2 , and Jan Strejček 1 1 Masaryk University, Brno, Czech Republic {xayaziov,strejcek}@fi.muni.cz 2 LMU Munich, Munich, Germany {dirk.beyer,marian.lingsch,spiessl}@sosy.ifi.lmu.de Abstract. Verification witnesses are now widely accepted objects used not only to confirm or refute verification results, but also for general exchange of information among various tools for program verification. The original format for witnesses is based on GraphML, and it has some known issues including a semantics based on control-flow automata, limited tool support of some format features, and a large size of witness files. This paper presents version 2.0 of the witness format, which is based on YAML and overcomes the above-mentioned issues. We describe the new format, provide an experimental comparison of various aspects of the original and the new witness format showing that both witness formats perform similarly, and report on its adoption in the community. Keywords: Verification Witness · Software Verification · Validation · Exchange Format · Invariant · Counterexample 1 Introduction Software verification is a process that detects bugs in computer programs or proves their absence. Unfortunately, software verifiers can also contain bugs and their verdicts can thus be incorrect. To increase the reliability of the verification process, starting eight years ago, software verifiers have accompanied their verification results with witnesses that justify the verdict and can be independently analyzed by witness validators developed by various teams and based on different techniques. The first generic format for witnesses of verification results [1] was introduced in 2015. It supported only violation witnesses (also called counterexamples) produced when a verifier reports that a given program violates a considered safety specification. In 2016, the format was extended to accommodate also witnesses for the cases when a verifier decides that a given program satisfies a given specification [2]. Such witnesses are called correctness witnesses, and they should contain invariants that help to prove that the program is correct. The format was soon adopted by the verification community and by the Competition on Software Verification (SV-COMP) [3], which led to fast adoption of the format by many verification tools and to the development of numerous witness validators. © The Author(s) 2025 T. Neele and A. Wijs (Eds.): SPIN 2024, LNCS 14624, pp. 184–203, 2025. https://doi.org/10.1007/978-3-031-66149-5_11 Software Verification Witnesses 2.0 185 The overview of existing validators can be found in a recent survey [4]. Since 2023, SV-COMP has a new track on witness validation [5]. While the format was originally intended for validation of verification results, some witness validators can also refute a witness [4, 5]. The format soon found also some applications that were not intended at the time of its development. In particular, it is used to exchange information between different verifiers in the context of cooperative verification [6, 7], as a way to provide feedback to a software developer [8, 9], or as a way to combine automatic and interactive verifiers [10]. In 2022, the authors of the format published a paper [11] with its detailed description and with an extensive experimental study on its applications. Despite the indisputable success of the format, it has also some weaknesses. The format is based on GraphML [12] and witnesses have the form of automata, which makes them easy to visualize, but also lengthy and unsuitable for reading in their textual form. More importantly, the semantics of the format is formally defined over programs represented by control-flow automata (CFA). Unfortunately, there is no standardized translation of programs written in common programming languages like C or Java to CFA. As a result, the semantics of the format over programs in standard languages has some ambiguities. The SV-COMP community even found a part of the semantics related to implicit loop edges as inappropriate and decided to change it. Another issue of the original witness format is connected to the high number of features it provides. For example, if an invariant or an assumption uses variables that appear in different functions or scopes, the format allows to specify the scope for their interpretation. Another example is that the location of some witness event can be specified very loosely by an interval of lines. Practical experience shows that some of these features are not used in any witness generated by verifiers and, what is more alarming, unsupported or even ignored by witness validators. In fact, there is probably no witness validator fully implementing the format. This can lead to the situation in which a valid verification witness employing some less frequent feature is not confirmed or even refuted, or an invalid witness is confirmed. This paper presents a new generation of the witness format that avoids the mentioned weaknesses. In particular, we use a concise format that is based on YAML, which makes the witnesses shorter in general. Further, the format provides only features that were really used by verification witnesses in the original format. As the format is significantly simpler, it is easy to fully support it by validators. Finally, the semantics of the format is formulated over programming languages using terms and concepts from their standards. The format itself is described in Sect. 2 and referred to as version 2.0. In its current state, the format supports only sequential programs written in C and basic program properties, namely unreachability of a given error function, unreachability of signed integer overflow, and unreachability of invalid pointer dereference and deallocation. We have adopted two verification tools, namely CPAchecker [13] and Symbiotic [14], to produce verification witnesses in the new format. We have also developed two witness validators, namely CPAchecker [11] (as an extension of the exiting tool) and Witch3 (new validator based on the 186 P. Ayaziová, D. Beyer, M. Lingsch-Rosenfeld, M. Spiessl, and J. Strejček concept of Witch3 [15]) to validate witnesses in this format. Using these tools and other tools from the competition, Sect. 3 evaluates the impact of the new format on the witnesses and their validation. Sect. 4 summarizes the differences between the original and the new format and shows the current adoption of the new format by verification and witness validation tools. Contributions. In this paper, we contribute: – a new generation of the format for verification witnesses that solves most problems that were present in the previous format, – a preliminary evaluation of the impact of the new format on the effectivity and performance of witness validation, and – an overview of a few measures that characterize the new witnesses. Related Work. Our work certainly stands on the shoulders of the original format for verification witnesses [1, 2], but we claim to provide a substantial improvement over the original format by addressing its weaknesses (see Sect. 4). Witnesses are used ubiquitously in areas where algorithms have a high computational complexity. For example, witnesses are used for certifying graph algorithms [16]. Turing used assertions [17] already and argued that one should justify the correctness of programs. In the area of logic solvers, witnesses for the results are of essence for competitions, and important competitions require witnesses and their validation. For example, the Termination Competition (termCOMP) [18] uses the format CPF [19], the competition of SAT solvers [20] uses the DRAT format [21] together with the validator DRAT-trim [22], and the competition on SMT solving verifies models with Dolmen [23]. Witnesses are not only important to certify the correctness of a solver’s answer, it is also important for the goal of explainability: The true / false answers alone are not as valuable compared to also providing the reasons to understand the answer. For example, witnesses can be used to derive test cases [9] and to aid debugging with visualizations [8]. Execution reports [24] help organize the analysis results, and the format SARIF [25] is used by static analyzers to represent results. 2 Witness Format 2.0 The witness format 2.0 is an extension of the YAML format, version 1.2. Individual verification witnesses are represented by entries. Each entry has three key-value pairs. The key entry_type has the value invariant_set or violation_sequence corresponding to the type of the witness: a correctness witness is represented by one or more entries of type invariant_set , while a violation witness is represented by a single entry of type violation_sequence . Further, the key metadata refers to a mapping that describes mainly the context of the witness: the format version used by the entry, the unique identifier of the entry, the creation time of the entry, the tool that produced the entry, and the verification task the witness relates to. Finally, the value of the key content represents the semantical content Software Verification Witnesses 2.0 187 Table 1: Structure of entries common for violation and correctness witnesses; some nodes are nested; optional items are marked with ∗ ; the term scalar in YAML refers also to strings Key Value Description entry_type invariant_set the entry type of a correctness witness violation_sequence the entry type of a violation witness metadata mapping the context of the witness; see below content sequence the witness content; see Tables 2 and 3 content of metadata format_version 2.0 the used version of the format uuid scalar a unique identifier of the entry; it uses the UUID format defined in RFC4122 creation_time scalar the date and time of the entry creation; it uses the format given by ISO 8601 producer mapping the tool that produced the entry; see below task mapping the verification task to which the entry is related; see below content of producer name scalar the name of the tool version scalar the version of the tool configuration ∗ scalar the configuration in which the tool ran command_line ∗ scalar the command line with which the tool ran; it should be a bash-compliant command description ∗ scalar any additional information content of task input_files sequence the list of files given as input to the verifier, e.g. ["path/f1.c", "path/f2.c"] input_file_hashes mapping SHA-256 hashes of all files in input_files , e.g. {"path/f1.c": 511..., "path/f2.c": f70...} specification scalar the property considered by the verifier; it uses the SV-COMP format given at https://sv-comp.sosy-lab.org/2024/rules.php data_model ILP32 or LP64 the data model considered for the task language C the programming language of the input files; the format currently supports only C of the entry. The key-value pairs are presented in a structured way in Table 1. The table also presents the key-value pairs of the nested mapping metadata and its nested mappings producer and task. We describe the possible values of the 188 P. Ayaziová, D. Beyer, M. Lingsch-Rosenfeld, M. Spiessl, and J. Strejček 1 void reach_error (){} 2 extern unsigned char __VERIFIER_nondet_uchar (void); 3 int main () { 4 unsigned char n = __VERIFIER_nondet_uchar (); 5 if (n == 0) { 6 return 0; 7 } 8 unsigned char v = 0; 9 unsigned int s = 0; 10 unsigned int i = 0; 11 while (i < n) { 12 v = __VERIFIER_nondet_uchar (); 13 s += v; 14 ++i; 15 } 16 if (s < v) { 17 reach_error (); 18 return 1; 19 } 20 if (s > 65025) { 21 reach_error (); 22 return 1; 23 } 24 return 0; 25 } Specification: G ! call(reach_error()), i.e., all calls of reach_error() are unreachable q1true q2 s ≤ i · 255 ∧ 0 ≤ i ≤ 255 ∧ n ≤ 255 q3true q4 true 10,enterLoopHead o/w o/w 11,then 11,else o/w 14,enterLoopHead o/w 1 - entry_type: invariant_set 2 metadata: <...> 3 content: 4 - invariant: 5 type: loop_invariant 6 location: 7 file_name: "inv -a.c" 8 line: 11 9 column: 1 10 function: main 11 value: "s <= i*255 && 0 <= → i && i <= 255 && n <= 255" 12 format: c_expression Fig. 1: Example C program inv-a.c taken from [11] (top left) satisfying the given specification (bottom left) and equivalent correctness witnesses in format 1.0 (top right, visualized as automaton) and format 2.0 (bottom right), with a single nontrivial invariant key content in the following subsections separately for correctness witnesses and violation witnesses as they are conceptually different. 2.1 Correctness Witnesses Correctness witnesses provide invariants that should help to prove the program correct. In the old format (1.0), invariants are tied to automata nodes and these nodes can correspond to multiple program locations and various moments of program executions. The new format (2.0) simply assigns invariants to program locations. Figure 1 provides an example of a correctness witness in the old format and in the new format. Syntax. In entries of type invariant_set which represent a correctness witness, the key content contains a sequence of zero or more invariants. An invariant is a mapping with the following four keys. type has the value loop_invariant if the invariant is assigned to a loop head and the value location_invariant if it is assigned to another location. Software Verification Witnesses 2.0 189 Table 2: Structure of the content part of entries representing correctness witnesses; optional items are marked with ∗ Key Value Description content sequence a sequence of one or more invariant elements description of invariant invariant mapping a basic building block of correctness witnesses; see below content of invariant type loop_invariant the invariant type for iteration statements location_invariant the invariant type for arbitrary statements location mapping the location of the invariant; see below format c_expression the invariant is a C expression value scalar the actual invariant content of location file_name scalar the file of the location line scalar the line number of the location column∗ scalar the column of the location function ∗ scalar the name of the function containing the location location of a loop_invariant must point to the first character of a keyword at the beginning of a loop (i.e., for, while, or do). The location of a location_invariant must point to the first character of a statement or a declaration that is within a compound statement. format has the value c_expression as the format currently supports only invariants that are C expressions. value holds the actual invariant string (e.g., "s <= i*255 && i > 0" ), which is a side-effect-free C expression over variables in the scope where the invariant is placed. The location is a mapping with mandatory keys file_name that holds the name of the file and line representing the line number (the first line has the number 1). Additionally, there are two optional keys called column and function . The key column specifies the column number of the location (value 1 is the position of the first character on the line). If the column is not given, then it is interpreted as the leftmost suitable position on the line, where suitability is given by type and the restrictions given above. The key function provides the name of the function containing the location. Technically, this information is superfluous as it is determined by the file_name , line, and column. It is therefore not intended for any algorithmic processing of the witness, but only to improve human readability of the witness. The structure of content and its nested items are summarized in Table 2. Semantics. The correctness witness is valid if it fulfills the following requirements. 190 P. Ayaziová, D. Beyer, M. Lingsch-Rosenfeld, M. Spiessl, and J. Strejček – Each loop_invariant must always hold immediately before evaluating the condition of the corresponding loop. – Each location_invariant must always hold immediately before evaluating the corresponding statement or declaration. – The specification must be satisfied for all program executions. – No invariant evaluation causes undefined behavior and no undefined behavior occurs during any execution of the program. Note that the order of invariants in an invariant_set or their division into several entries of type invariant_set is not important. The semantics also reveals the difference between the two types of invariants: if we replace loop_invariant with location_invariant , then the invariant has to hold only before the loop is executed, but not after each loop iteration. 2.2 Violation Witnesses A violation witness should describe a program execution violating the considered property. For brevity, the violating execution is described loosely and the witness thus represents a set of such executions. In the old format (1.0), a violation witness is an automaton with edges prescribing consecutive restrictions on program executions. The automaton can contain various branches and loops. In the new format (2.0), a violation witness is a sequence of waypoints that have to be passed by the executions. To make the witness validation more efficient, the format also allows specifying waypoints that have to be avoided. Figure 2 provides an example of a violation witness in the old format and in the new format. Syntax. The basic building blocks of violation witnesses in the new format are waypoints. Technically, a waypoint is a mapping with four keys, namely type, location , constraint , and action. The values of the first three keys specify a requirement on a program execution to pass a waypoint: type describes the type of the requirement, location ties the requirement to some program location, and constraint gives the requirement itself. The key action then states whether the executions represented by the witness should pass through the waypoint (value follow) or avoid it (value avoid). The format currently supports five possible values of type with the following meanings: assumption The location has to point to the first character of a statement or a declaration within a compound statement. A requirement of this type says that a given constraint holds before evaluating the pointed statement or declaration. The constraint is a mapping with two keys: format specifies the language of the assumption and value contains a side-effect-free assumption over variables in the current scope. The value of format is c_expression as C expressions are the only assumptions currently supported. In the future, we plan to support also assumptions in ACSL [26]. branching A requirement of this type says that a given branching is evaluated in a given way. The location points to the first character of a branching keyword like if, while, switch, or to the character ? in the ternary operator (?:). Software Verification Witnesses 2.0 191 1 void reach_error (){} 2 extern unsigned char → __VERIFIER_nondet_uchar (void); 3 int main () { 4 unsigned char n = __VERIFIER_nondet_uchar (); 5 if (n == 0) { 6 return 0; 7 } 8 unsigned char v = 0; 9 unsigned char s = 0; 10 unsigned int i = 0; 11 while (i < n) { 12 v = __VERIFIER_nondet_uchar (); 13 s += v; 14 ++i; 15 } 16 if (s < v) { 17 reach_error (); 18 return 1; 19 } 20 if (s > 65025) { 21 reach_error (); 22 return 1; 23 } 24 return 0; 25 } Specification: G ! call(reach_error()), i.e., all calls of reach_error() are unreachable ble qi q1 q2 q4 qE 11,then o/w 11,then o/w 11,else o/w 16,then o/w o/w 1 - entry_type: → violation_sequence 2 metadata: <...> 3 content: 4 - segment: 5 - waypoint: 6 action: follow 7 type: branching 8 location: 9 file_name: → "inv -b.c" 10 line: 11 11 constraint: 12 value: true 13 - segment: 14 - waypoint: 15 action: follow 16 type: branching 17 location: 18 file_name: → "inv -b.c" 19 line: 11 20 constraint: 21 value: true 22 - segment: 23 - waypoint: 24 action: follow 25 type: branching 26 location: 27 file_name: → "inv -b.c" 28 line: 11 29 constraint: 30 value: false 31 - segment: 32 - waypoint: 33 action: follow 34 type: target 35 location: 36 file_name: → "inv -b.c" 37 line: 17 Fig. 2: Example C program inv-b.c taken from [11] (top left) violating the given specification (bottom left) and similar violation witnesses in format 1.0 (middle, visualized as automaton) and format 2.0 (right) The constraint is then a mapping with only one key value. For binary branchings, value can be either true or false saying whether the true branch is used or not. For the keyword switch, value can be an integer constant or default . The integer constant specifies the value of the controlling expression of the switch statement. The value default says that the value of this expression does not match any case of the switch with the exception of the default case (if it is present). function_enter The location points to the right parenthesis after the function arguments of a function call. The requirement says that the called function is entered. The key constraint has to be omitted in this case. 192 P. Ayaziová, D. Beyer, M. Lingsch-Rosenfeld, M. Spiessl, and J. Strejček function_return Such a requirement says that a given function call has been evaluated and the returned value satisfies a given constraint. The location points to the right parenthesis after the function arguments at the function call. The constraint is a mapping with keys format and value. We currently support only ACSL expressions of the form \result , where is one of ==, !=, <=, <, >, >= and is a constant expression. The value of format has to be acsl_expression . target This type of requirement can be used only with action follow and it marks the program location where the property is violated. More precisely, the location points at the first character of the statement or full expression whose evaluation is sequenced directly before the violation occurs, i.e., there is no other evaluation sequenced before the violation and after the sequence point associated with the location . This also implies that it can point to a function call only if it calls a function of the C standard library that violates the property or if the function call itself is the property violation. The key constraint has to be omitted. Waypoints are organized into segments. Each segment is a sequence of zero or more waypoints with action avoid and exactly one waypoint with action follow at the end. A segment is called final if it ends with the target waypoint and it is called normal otherwise. Finally, we can describe the content part of violation_sequence entries which represent violation witnesses. The value of content is a sequence of zero or more normal segments and exactly one final segment at the end. The structure of content and its nested items are summarized in Table 3. Semantics. Each violation witness represents a set of some program executions violating the specified property. The witness is considered to be valid if the set is nonempty. Let us consider a violation witness with n ≥ 1 segments. An execution is represented by this witness, if the execution can be divided into n parts such that, for each 1 ≤ i ≤ n, the i-th part matches the corresponding segment of the witness. An execution part matches a normal segment if – it does not pass any avoid waypoint of the segment, – it ends in the moment when the sequence point corresponding to the follow waypoint of the segment is entered for the first time in the execution part, and – the follow waypoint is passed in this moment. The final execution part matches the final segment if – it does not pass any avoid waypoint of the segment and – it violates the considered property during execution of the statement identified by the target waypoint. Moreover, the execution must not contain any instruction that causes undefined behavior. An exception to this are witnesses of undefined behavior, in Software Verification Witnesses 2.0 193 Table 3: Structure of the content part of entries representing violation witnesses Key Value Description content sequence a sequence of zero or more normal segment elements and one final segment at the end description of segment segment sequence a sequence of zero or more waypoint elements with action: avoid and one waypoint with action: follow at the end; the final segment ends by waypoint with type: target description of waypoint waypoint mapping a basic building block of violation witnesses content of waypoint action follow the waypoint should be passed through avoid the waypoint should be avoided type assumption restriction on variable values given by an expression branching restriction specifying the result of a branching function_enter restriction saying that a function is entered function_return restriction on the result of a function call target identification of a location of the property violation location mapping the location of the waypoint; see Table 2 constraint mapping the constraint of the waypoint; not allowed with type: function_enter and type: target content of constraint format c_expression for type: assumption , constraints are C expressions acsl_expressions for type: function_return , constraints are specific ACSL expressions; not allowed for other type values value scalar the actual constraint which case the only instruction that causes undefined behavior must be the one represented by the target waypoint. In each execution part, only the waypoints of the corresponding segment are evaluated. An assumption waypoint is evaluated at the sequence point immediately before the waypoint location. The evaluation must not lead to undefined behavior; otherwise the witness is incorrect. The waypoint is passed if the given constraint evaluates to true. A branching waypoint is evaluated at the sequence point immediately after evaluation of the controlling expression of the corresponding branching statement. The waypoint is passed if the resulting value of the controlling expression corresponds to the given constraint. A function_enter waypoint is evaluated at the sequence point immediately after evaluation of all arguments of the function call. The waypoint is passed without any additional constraint. A function_return waypoint is evaluated immediately after evalua- 194 P. Ayaziová, D. Beyer, M. Lingsch-Rosenfeld, M. Spiessl, and J. Strejček tion of the corresponding function call. The waypoint is passed if the returned value satisfies the given constraint. 3 Evaluation This section presents experiments with validation of verification results using both formats (1.0 and 2.0) to answer the following research questions: – RQ 1: How does the performance of the validation of the new witness format compare to the old witness format? – RQ 2: Does the new format improve attributes related to readability when compared to the old format? In the experiments, the following tools were used. – CPAchecker [13, 27] is a verifier and witness validator that can produce and validate both correctness and violation witnesses in both formats. The experiments are based on version 0af0e41240. – Symbiotic [28] is a verifier that can produce violation witnesses in both formats. We use version 9c278f9. – Symbiotic-Witch2 [15] is a witness validator for violation witnesses in the old format (1.0). The experiments are based on version svcomp24. – Witch3 [29] is a new witness validator based on similar principles as SymbioticWitch2, but designed for violation witnesses in format 2.0. The tool is made of Symbiotic in version b011ec9 and Witch-Klee in version 6dabb94. – UAutomizer [30] is a verifier and witness validator that can produce both correctness and violation witnesses in both formats and validate correctness witnesses in both formats and violation witnesses only in format 1.0. We use version 0.2.4-?-8430d5a-m and version 0.2.4-dev-0e0057c for validation of YAML and GraphML correctness witnesses respectively. Note that the support of the new witness format in all mentioned tools except UAutomizer has been implemented by authors of this paper. For the experiments, we used all verification tasks of SV-COMP 2024 where the property to be verified is unreachabilty of error function, i.e., the specification used in Figs. 1 and 2. We did not use the witnesses produced during the competition [31], but rather based our experiments on fresh witnesses produced by the latest versions of Symbiotic and CPAchecker; the results are much better compared to the results from SV-COMP 2024 [32]. Benchmark Environment. For conducting our evaluation, we use BenchExec to ensure reliable benchmarking [33]. All benchmarks are performed on machines with an Intel Xeon E5-1230 CPU (4 physical cores with 2 processing units each), 33 GB of RAM, and running Ubuntu 22.04 as operating system. Each verification and witness validation task is executed with resource limits used in SV-COMP, i.e., 900 s of CPU time3 , 15 GB of memory, and 1 physical core (2 processing units). 3 Except violation witness validation, where the convention is to use 90 s of CPU time. Software Verification Witnesses 2.0 195 0 1 000 2 000 3 000 4 000 5 000 10 100 1 000 n-th fastest result CPUtime(s) Validation on v1.0 Validation on v2.0 (a) Validation using CPAchecker 0 1 000 2 000 3 000 100 1 000 n-th fastest result CPUtime(s) Validation on v1.0 Validation on v2.0 (b) Validation using UAutomizer Fig. 3: Correctness witnesses produced by CPAchecker: Quantile plots for the time taken for validation of the old and new witnesses for two different validators 3.1 Evaluation Results for RQ 1 (Validation Performance) One of the most important questions is whether the validation using the new format is as effective and efficient as with the old witness format. Correctness Witnesses. CPAchecker can generate correctness witnesses in both formats. The witnesses from CPAchecker were then validated by CPAchecker and UAutomizer. This allows for a direct comparison as shown in Fig. 3. We can observe in Fig. 3a that the validation performance of CPAchecker is largely identical when comparing both formats. This is to be expected, as the only thing that CPAchecker extracts from the GraphML witnesses are the invariants and their locations, and this is also the information that is present in and extracted from the witnesses in format 2.0. For UAutomizer, the new format 2.0 substantially improves both the speed of validation and the number of witnesses that can be validated. Besides aiding in verification of the original property during validation, a witness can also add additional obligations for the validator to validate. This is the case here, where the extensive automaton that is embedded into CPAchecker’s witnesses in format 1.0 is harder to prove correct for UAutomizer than the much simpler set of invariants that is present in the witnesses in format 2.0. Table 4 shows numbers of confirmed and refuted witnesses. Violation Witnesses. We present the results of our evaluation regarding RQ 1 for violation witnesses in Fig. 4, which is complemented by Table 5 with the concrete number of validated and refuted witnesses. For witnesses generated by CPAchecker (cf. Fig. 4a), Witch3 is able to confirm significantly more witnesses in the new format than Symbiotic-Witch2 is able to confirm in the old 196 P. Ayaziová, D. Beyer, M. Lingsch-Rosenfeld, M. Spiessl, and J. Strejček Table 4: Correctness witnesses produced by CPAchecker: Validation with CPAchecker and UAutomizer Witnesses v1.0 Witnesses v2.0 Validator Witnesses Confirmed Refuted Confirmed Refuted CPAchecker 6 729 4 685 0 4 741 0 UAutomizer 6 729 2 478 109 2 959 2 0 500 1 000 1 500 2 000 .1 1 10 100 1 000 10 000 n-th fastest result CPUtime(s) CPAchecker validation on v1.0 CPAchecker validation on v2.0 Sym-Witch2 validation on v1.0 Witch3 validation on v2.0 (a) Validation of violation witnesses generated by CPAchecker 0 500 1 000 1 500 .1 1 10 100 1 000 10 000 n-th fastest result CPUtime(s) CPAchecker validation on v1.0 CPAchecker validation on v2.0 Sym-Witch2 validation on v1.0 Witch3 validation on v2.0 (b) Validation of violation witnesses generated by Symbiotic Fig. 4: Violation witnesses: Quantile plots for the time taken for validation of the old and the new witnesses generated by two different verifiers for two different validators format. Due to the large number of features and underspecified semantics of the GraphML format, Symbiotic-Witch2 does not support all the attributes used in the GraphML witnesses. Ignoring these features leads to a larger state-space that needs to be explored during validation, which results in more timeouts, or misinterpreting information in the witness and missing the described error. This is not the case for Witch3 as it supports the full set of features of the new format and makes use of all the information provided in the witness. For CPAchecker there is still a relatively small performance gap between validation with the new and old format. This is not surprising, as the GraphML-based format is inspired by the specification-automata language that CPAchecker uses internally, so achieving a similar performance requires still some more engineering. For witnesses generated by Symbiotic (cf. Fig. 4b), we can observe that the number of new witnesses confirmed by Witch3 is almost the same as the number of old witnesses confirmed by Symbiotic-Witch2. Thus, both validation approaches are very close when it comes to effectiveness. This is also the case for Software Verification Witnesses 2.0 197 Table 5: Results of validating CPAchecker’s and Symbiotic’s violation witnesses with different validators; Witch stands for Symbiotic-Witch2 when validating old witnesses, and for Witch3 when validating new witnesses. Witnesses v1.0 Witnesses v2.0 Verifier Validator Produced Confirmed Refuted Confirmed Refuted CPAchecker CPAchecker 2 011 1 880 35 1 657 9 CPAchecker Witch 2 011 798 0 1 312 17 Symbiotic Witch 1 556 1 533 5 1 516 0 Symbiotic CPAchecker 1 556 1 319 29 1 315 27 Table 6: Different attributes of correctness witnesses in version 1.0 and 2.0 generated by CPAchecker Witnesses v1.0 Witnesses v2.0 Attribute Min Median Max Min Median Max Length in Lines of Code 53 1 536 1 014 533 18 28 1 058 Size in kB 3 52 35 573 1 1 965 Number of Nodes 3 114 26 899 - - Number of Edges 2 198 142 016 - - Number of Invariants 0 1 162 0 1 104 CPAchecker, which manages to confirm almost the same number of witnesses in both formats. The new format for correctness witnesses does not reduce validation performance, for UAutomizer it shows a significant advantage over the old format. For violation witnesses, Witch3 handling the new format performs better than Symbiotic-Witch2 on the old format, while there is still room for improvement of CPAchecker as it performs slightly better on the old format. 3.2 Evaluation results for RQ 2 (Witness Readability) Another important question is concerned with the attributes corresponding to the readability of the files encoding the witnesses. In particular, we are interested in the size and length of the witnesses, since this has a large effect on how easy they are to be read and understood by humans and machines. Table 6 provides an overview of different attributes of the two versions of witnesses produced by CPAchecker for correctness. Table 7 does the same for violation witnesses produced by CPAchecker and Symbiotic. Some attributes are only applicable to one of the two versions of witnesses. For correctness witnesses we can see that the new witnesses are usually very small in comparison to the old witnesses. This is because the new witnesses encode only the invariants, while the old witnesses encode information about the controlflow of the program. One explanation for the difference is that witnesses in version 1.0 roughly scale with the size of the program. While witnesses in version 2.0 scale 198 P. Ayaziová, D. Beyer, M. Lingsch-Rosenfeld, M. Spiessl, and J. Strejček Table 7: Different attributes of violation witnesses in version 1.0 and 2.0 generated by CPAchecker and Symbiotic Witnesses v1.0 Witnesses v2.0 Attribute Min Median Max Min Median Max Length in Lines of Code 12 372 258 730 27 171 114 460 Size in kB 2 14 9 098 1 6 3 071 Number of Nodes 1 38 28 304 - - Number of Edges 0 42 28 793 - - Number of Waypoints - - - 1 13 9 537 only with respect to the amount of invariants, which for CPAchecker is roughly correlated to the amount of function calls and loops. As we saw in Sect. 3.1, this extra information is not necessarily relevant for validation. For violation witnesses, we see that, apart from a small factor due to overhead in describing the automaton, both formats are similar in all metrics. This is not surprising, as both formats encode similar information about an error path. Therefore, they both roughly scale with the amount of assumption for nondeterministic variables and the amount of branching decisions in the error path. The tables show that the new witnesses are usually much shorter than the old witnesses. As we have seen in Sect. 3.1, this does not have a negative impact on the validation performance, since the information most relevant for validation is retained. Having less information makes it much easier for a verification engineer to understand the witness and use it in some further processing steps. In summary, witnesses in version 2.0 are generally much smaller and easier to read than witnesses in version 1.0, while retaining all important data. 3.3 Threats to Validity Internal Validity. We used the benchmarking framework BenchExec [33] to run the experiments, which uses the most modern Linux features for reliable benchmarking. This tool also makes sure to never run two different executions on the same physical core, in order to avoid interference of shared computing resources. Our validation tools might contain bugs, which could lead to wrong conclusions, however, our claim is that the new format works already sufficiently well to serve as an alternative format. External Validity. The conclusions about the validators might not hold for other validators that will be developed in the future, also, witnesses generated by other verifiers might have different characteristics. However, other tools are not expected to deviate much from the presented witnesses, because they would serve the same purpose of testifying the bug or proof. Our experiments were done on a Software Verification Witnesses 2.0 199 large benchmark set, which is also used in competitions, but it could still be the case that there are witnesses and programs for which the results presented are not applicable. Since extending and improving the witness format is an ongoing process, we expect that if this is the case, it will be adequately addressed in the future. 4 Version 1.0 vs. Version 2.0 The witness format 2.0 is closely tied to the actual program syntax. While the format 1.0 uses an automaton largely independent of the program syntax and closely tied to the program representation as control-flow automata internally used by some verifiers. Due to this, the format 2.0 is more succinct, has welldefined semantics, and is easier to understand by humans. On the other hand, format 1.0 is more expressive, for example it can define different loop invariants for the same loop, when two different paths are taken to reach the loop. Currently, the format 2.0 has the same practical limitations as the format 1.0. In the case of correctness witnesses, they have not yet been defined for concurrency safety, memory safety and for termination. Violation witnesses have not yet been defined for concurrency safety. There are also features which are not yet supported by the new format but which are straightforward extensions, such as support for Java and violation termination witnesses. Extending witnesses to be able to validate more programs and specifications is ongoing work, we expect that the simplification of the syntax and clarification of the semantics with format version 2.0 will make it easier to extend the format in the future. In order to validate our concept of the new format, we reported our initiative to the SV-COMP community, and the jury made a decision to immediately include the new format as an alternative to the existing format, in order to quickly adopt it and improve the state of the art. This was seen in SV-COMP 2024 [32], where 8 verifiers and 4 validators supported the correctness witnesses v2.0 and 2 verifiers and 2 validators supported the violation witnesses v2.0. Table 8 shows all these tools and their support of witnesses formats in detail. This also shows the large interest the software verification community has in the new format, since the first mention of the format for correctness witnesses was only in September 20214 and the work on the violation witnesses part of the new format started only in April 2023. 5 Conclusion Verification witnesses are an important part of the software-verification ecosystem. Just like verification tools, specification formats, and witness validators, there is also a need to improve the format for verification witnesses. This paper introduces the witness format version 2.0, which changes the container format from GraphML to YAML, has more concise data representation, and has a clearly 4 https://gitlab.com/sosy-lab/benchmarking/sv-witnesses/-/merge_requests/44 200 P. Ayaziová, D. Beyer, M. Lingsch-Rosenfeld, M. Spiessl, and J. Strejček Table 8: Tools with some support of witnesses format 2.0 and their abilities to generate/verify correctness/violation witnesses in format 1.0/2.0 in SV-COMP 2024; tools where the support of witness format 2.0 was implemented by the authors of this paper are typeset in bold Witness Generation Witness Validation Correctness Violation Correctness Violation Tool v1.0 v2.0 v1.0 v2.0 v1.0 v2.0 v1.0 v2.0 CPAchecker • • • • • • • • Symbiotic • • • Symbiotic-Witch2 • Witch3 • UAutomizer • • • • • • UKojak • • • UTaipan • • • UGemCutter • • • Mopsa • • • CPV • • • Goblint • • • defined semantics independent from control-flow automata. Besides describing the syntax and semantics of the new format, we also evaluated the effectiveness and efficiency induced by the new format. In sum, the new witnesses are much smaller and the experimental results show a significantly improved confirmation rate for some validators: using the new format, UAutomizer can confirm 481 more correctness witnesses (Table 4) and Witch3 can confirm 514 more violation witnesses (Table 5). Furthermore, shortly after we proposed this new format, already seven other tools support the format, which is an indicator that the developers value the new format. Data-Availability Statement. A reproduction package (that includes all software and data that we used for our experiments) is available on Zenodo [34]. Funding Statement. P. Ayaziová and J. Strejček were supported by the Czech Science Foundation grant GA23-06506S. D. Beyer, M. Lingsch-Rosenfeld, and M. Spiessl were supported by the Deutsche Forschungsgemeinschaft (DFG) – 378803395 (ConVeY) and 496588242 (IdeFix). References 1. Beyer, D., Dangl, M., Dietsch, D., Heizmann, M., Stahlbauer, A.: Witness validation and stepwise testification across software verifiers. In: Proc. FSE. pp. 721–733. ACM (2015). https://doi.org/10.1145/2786805.2786867 2. Beyer, D., Dangl, M., Dietsch, D., Heizmann, M.: Correctness witnesses: Exchanging verification results between verifiers. In: Proc. FSE. pp. 326–337. ACM (2016). https://doi.org/10.1145/2950290.2950351 Software Verification Witnesses 2.0 201 3. Beyer, D.: Software verification and verifiable witnesses (Report on SVCOMP 2015). In: Proc. TACAS. pp. 401–416. LNCS 9035, Springer (2015). https://doi.org/10.1007/978-3-662-46681-0_31 4. Beyer, D., Strejček, J.: Case study on verification-witness validators: Where we are and where we go. In: Proc. SAS. pp. 160–174. LNCS 13790, Springer (2022). https://doi.org/10.1007/978-3-031-22308-2_8 5. Beyer, D.: Competition on software verification and witness validation: SVCOMP 2023. In: Proc. TACAS (2). pp. 495–522. LNCS 13994, Springer (2023). https://doi.org/10.1007/978-3-031-30820-8_29 6. Beyer, D., Wehrheim, H.: Verification artifacts in cooperative verification: Survey and unifying component framework. In: Proc. ISoLA (1). pp. 143–167. LNCS 12476, Springer (2020). https://doi.org/10.1007/978-3-030-61362-4_8 7. Beyer, D., Haltermann, J., Lemberger, T., Wehrheim, H.: Decomposing Software Verification into Off-the-Shelf Components: An Application to CEGAR. In: Proc. ICSE. pp. 536–548. ACM (2022). https://doi.org/10.1145/3510003.3510064 8. Beyer, D., Dangl, M.: Verification-aided debugging: An interactive web-service for exploring error witnesses. In: Proc. CAV (2). pp. 502–509. LNCS 9780, Springer (2016). https://doi.org/10.1007/978-3-319-41540-6_28 9. Beyer, D., Dangl, M., Lemberger, T., Tautschnig, M.: Tests from witnesses: Execution-based validation of verification results. In: Proc. TAP. pp. 3–23. LNCS 10889, Springer (2018). https://doi.org/10.1007/978-3-319-92994-1_1 10. Beyer, D., Spiessl, M., Umbricht, S.: Cooperation between automatic and interactive software verifiers. In: Proc. SEFM. p. 111–128. LNCS 13550, Springer (2022). https://doi.org/10.1007/978-3-031-17108-6_7 11. Beyer, D., Dangl, M., Dietsch, D., Heizmann, M., Lemberger, T., Tautschnig, M.: Verification witnesses. ACM Trans. Softw. Eng. Methodol. 31(4), 57:1–57:69 (2022). https://doi.org/10.1145/3477579 12. Brandes, U., Eiglsperger, M., Herman, I., Himsolt, M., Marshall, M.S.: GraphML progress report. In: Graph Drawing. pp. 501–512. LNCS 2265, Springer (2001). https://doi.org/10.1007/3-540-45848-4_59 13. Beyer, D., Keremoglu, M.E.: CPAchecker: A tool for configurable software verification. In: Proc. CAV. pp. 184–190. LNCS 6806, Springer (2011). https://doi.org/10.1007/978-3-642-22110-1_16 14. Chalupa, M., Řechtáčková, A., Mihalkovič, V., Zaoral, L., Strejček, J.: Symbiotic 9: String analysis and backward symbolic execution with loop folding (competition contribution). In: Proc. TACAS (2). pp. 462–467. LNCS 13244, Springer (2022). https://doi.org/10.1007/978-3-030-99527-0_32 15. Ayaziová, P., Strejček, J.: Symbiotic-Witch 2: More efficient algorithm and witness refutation (competition contribution). In: Proc. TACAS (2). pp. 523–528. LNCS 13994, Springer (2023). https://doi.org/10.1007/978-3-031-30820-8_30 16. McConnell, R.M., Mehlhorn, K., Näher, S., Schweitzer, P.: Certifying algorithms. Computer Science Review 5(2), 119–161 (2011). https://doi.org/10.1016/j.cosrev.2010.09.009 17. Turing, A.: Checking a large routine. In: Report on a Conference on High Speed Automatic Calculating Machines. pp. 67–69. Cambridge Univ. Math. Lab. (1949), https: //turingarchive.kings.cam.ac.uk/publications-lectures-and-talks-amtb/amt-b-8 18. Giesl, J., Mesnard, F., Rubio, A., Thiemann, R., Waldmann, J.: Termination competition (termCOMP 2015). In: Proc. CADE. pp. 105–108. LNCS 9195, Springer (2015). https://doi.org/10.1007/978-3-319-21401-6_6 19. Sternagel, C., Thiemann, R.: The certification problem format. In: Proc. UITP. pp. 61–72. EPTCS 167, EPTCS (2014). https://doi.org/10.4204/EPTCS.167.8 202 P. Ayaziová, D. Beyer, M. Lingsch-Rosenfeld, M. Spiessl, and J. Strejček 20. Järvisalo, M., Berre, D.L., Roussel, O., Simon, L.: The international SAT solver competitions. AI Magazine 33(1) (2012) 21. Heule, M.J.H.: The DRAT format and drat-trim checker. CoRR 1610(06229) (October 2016) 22. Wetzler, N., Heule, M.J.H., Jr., W.A.H.: Drat-trim: Efficient checking and trimming using expressive clausal proofs. In: Proc. SAT. pp. 422–429. LNCS 8561, Springer (2014). https://doi.org/10.1007/978-3-319-09284-3_31 23. Bury, G., Bobot, F.: Verifying models with Dolmen. In: Proc. SMT Workshop. CEUR Workshop Proceedings, CEUR (2023) 24. Castaño, R., Braberman, V.A., Garbervetsky, D., Uchitel, S.: Model checker execution reports. In: Proc. ASE. pp. 200–205. IEEE (2017). https://doi.org/10.1109/ASE.2017.8115633 25. OASIS: Static analysis results interchange format (sarif) version 2.0 (2019) 26. Baudin, P., Cuoq, P., Filliâtre, J.C., Marché, C., Monate, B., Moy, Y., Prevosto, V.: ACSL: ANSI/ISO C specification language version 1.17 (2021), available at https://frama-c.com/download/acsl-1.17.pdf 27. Baier, D., Beyer, D., Chien, P.C., Jankola, M., Kettl, M., Lee, N.Z., Lemberger, T., Lingsch-Rosenfeld, M., Spiessl, M., Wachowitz, H., Wendler, P.: CPAchecker 2.3 with strategy selection (competition contribution). In: Proc. TACAS. LNCS , Springer (2024) 28. Jonáš, M., Kumor, K., Novák, J., Sedláček, J., Trtík, M., Zaoral, L., Ayaziová, P., Strejček, J.: Symbiotic 10: Lazy memory initialization and compact symbolic execution (competition contribution). In: Proc. TACAS. LNCS , Springer (2024) 29. Ayaziová, P., Strejček, J.: Witch 3: Validation of violation witnesses in the witness format 2.0 (competition contribution). In: Proc. TACAS. LNCS , Springer (2024) 30. Heizmann, M., Bentele, M., Dietsch, D., Jiang, X., Klumpp, D., Schüssele, F., Podelski, A.: Ultimate automizer and the abstraction of bitwise operations (competition contribution). In: Proc. TACAS. LNCS , Springer (2024) 31. Beyer, D.: Verification witnesses from verification tools (SV-COMP 2024). Zenodo (2024). https://doi.org/10.5281/zenodo.10669737 32. Beyer, D.: State of the art in software verification and witness validation: SV-COMP 2024. In: Proc. TACAS. LNCS , Springer (2024) 33. Beyer, D., Löwe, S., Wendler, P.: Reliable benchmarking: Requirements and solutions. Int. J. Softw. Tools Technol. Transfer 21(1), 1–29 (2019). https://doi.org/10.1007/s10009-017-0469-y 34. Ayaziová, P., Beyer, D., Lingsch-Rosenfeld, M., Spiessl, M., Strejček, J.: Reproduction package for SPIN 2024 article ‘Software verification witnesses 2.0’. Zenodo (2024). https://doi.org/10.5281/zenodo.10826204 Software Verification Witnesses 2.0 203 Open Access. This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.