Angelic Verification: Precise Verification Modulo Unknowns Ankush Das1 , Shuvendu K. Lahiri1(B) , Akash Lal1 , and Yi Li2 1 Microsoft Research, Bangalore, India {t-ankdas,shuvendu,akashl}@microsoft.com 2 University of Toronto, Toronto, Canada liyi@cs.toronto.edu Abstract. Verification of open programs can be challenging in the presence of an unconstrained environment. Verifying properties that depend on the environment yields a large class of uninteresting false alarms. Using a verifier on a program thus requires extensive initial investment in modeling the environment of the program. We propose a technique called angelic verification for verification of open programs, where we constrain a verifier to report warnings only when no acceptable environment specification exists to prove the assertion. Our framework is parametric in a vocabulary and a set of angelic assertions that allows a user to configure the tool. We describe a few instantiations of the framework and an evaluation on a set of real-world benchmarks to show that our technique is competitive with industrial-strength tools even without models of the environment. 1 Introduction Scalable software verifiers offer the potential to find defects early in the development cycle. The user of such a tool can specify a property (e.g. correct usage of kernel/security APIs) using some specification language and the tool validates that the property holds on all feasible executions of the program. There has been a significant progress in the area of software verification, leveraging ideas from model checking [13], theorem proving [34] and invariant inference algorithms [16,22,33]. Tools based on these principles (e.g. SDV [3], F-Soft [24]) have found numerous bugs in production software. However, a fundamental problem still limits the adoption of powerful software verifiers in the hands of end users. Most (interprocedural) program verifiers aim to verify that a program does not fail assertions under all possible feasible executions of the program. This is a good match when the input program is “closed”, i.e., its execution starts from a well-defined initial state, and external library methods are included or accurately modeled. Scalability concerns preclude performing monolithic verification that includes all transitive callers and library source code. In practice, a significant portion of verification tool development requires closing a program by (i) either providing a harness (a client program) or a module invariant [30] to constrain the inputs and (ii) stubs for external library c Springer International Publishing Switzerland 2015 D. Kroening and C.S. P˘as˘areanu (Eds.): CAV 2015, Part I, LNCS 9206, pp. 324–342, 2015. DOI: 10.1007/978-3-319-21690-4 19 Angelic Verification: Precise Verification Modulo Unknowns 325 1 // inconsistency 2 procedure Bar(x:int) { 3 if (x ! = NULL) { gs := 1; } 4 else { gs := 2; } 5 // possible BUG or dead code 6 assert x ! = NULL; 7 m[x] := 5; 8 } 9 // internal bug 10 procedure Baz(y:int) { 11 assert y ! = NULL; //DEFINITE BUG 12 m[y] := 4; 13 } 14 // entry point 15 procedure Foo(z:int) { 16 call Bar(z); // block + relax 17 call Baz(NULL); // internal bug 18 call FooBar(); // external calls 19 } 20 // globals 21 var gs: int , m:[int]int ; 22 23 // external call 24 procedure FooBar() { 25 var x, w, z: int ; 26 call z := Lib1 (); 27 assert z ! = NULL; 28 m[z] := NULL; 29 call x := Lib2 (); 30 assert x ! = NULL; 31 w := m[x]; 32 assert w ! = NULL; 33 m[w] := 4; 34 } 35 // library 36 procedure Lib1() returns (r: int ); 37 procedure Lib2() returns (r: int ); Fig. 1. Running example. procedures [3]. The effect of modeling is to constrain the set of unknowns in the program to rule out infeasible executions. Absence of such modeling results in numerous uninteresting alarms and deters a user from further interacting with the tool. “A stupid false positive implies the tool is stupid” [6]. The significant initial modeling overhead often undermines the value provided by verifiers. Even “bounded” versions of verifiers (such as CBMC [14]) suffer from this problem because these unknowns are present even in bounded executions. Example 1. Consider the example program (written in the Boogie language [4]) in Fig. 1. The program has four procedures Foo, Bar, Baz, FooBar and two external library procedures Lib1, Lib2. The variables in the programs can be scalars (of type int) or arrays (e.g. m) that map int to int. The Boogie program is an encoding of a C program [15]: pointers and values are uniformly modeled as integers (e.g. parameter x of Bar, or the return value of Lib1), and memory dereference is modeled as array lookup (e.g. m[x]). The procedures have assertions marked using assert statements. The entry procedure for this program is Foo. There are several sources of unknowns or unconstrained values in the program: the parameter z to Foo, the global variable m representing the heap, and the return values of library procedures Lib1 and Lib2. Even a precise verifier is bound to return assertion failures for each of the assertions in the program. This is due to the fact that all the assertions, except the one in Baz (the only definite bug in the program) are assertions over unknowns in the program and (sound) verifiers tend to be conservative (over-approximate) in the face of unknowns. Such demonic nature of verifiers will result in several false alarms. Overview. Our goal is to push back on the demonic nature of the verifier by prioritizing alarms with higher evidence. In addition to the warning in Baz, the assertion in Bar is suspicious as the only way to avoid the bug is to make the “else” branch unreachable in Bar. For the remaining assertions, relatively simple 326 A. Das et al. constraints on the unknown values suffice to explain the correctness of these assertions. For example, it is reasonable to assume that calls to library methods do not return NULL, their dereferences (m[x]) store non-null values and calls to two different library methods do not return aliased pointers. We tone down the demonic nature of verifiers by posing a more angelic decision problem for the verifier (also termed as abductive inference [10,20]): For a given assertion, does there exists an acceptable specification over the unknowns such that the assertion holds? This forces the verifier to work harder to exhaust the space of acceptable specifications before showing a warning for a given assertion. Of course, this makes the verification problem less defined as it is parameterized by what constitutes “acceptable” to the end user of the tool. At the same time, it allows a user to be able to configure the demonic nature of the tool by specifying a vocabulary of acceptable specifications. In this paper, we provide a user a few dimensions to specify a vocabulary Vocab that constitutes a specification (details can be found in Sect. 4). The vocabulary can indicate a template for the atomic formulas, or the Boolean and quantifier structure. Given a vocabulary Vocab, we characterize an acceptable specification by how (a) concise and (b) permissive the specification is. Conciseness is important for the resulting specifications to be understandable by the user. Permissiveness ensures that the specification is not overly strong, thus masking out true bugs. The failure in Bar is an example, where a specification x = NULL is not permissive as it gives rise to dead code in the “else” branch before the assertion. To specify desired permissiveness, we allow the users to augment the program with a set of angelic assertions ˆA. The assertions in ˆA should not be provable in the presence of any inferred specification over the unknowns. An angelic assertion assert e ∈ ˆA at a program location l indicates that the user expects at least one state to reach l and satisfy ¬e. For Bar one can add two assertions assert false inside each of the branches. The precondition x = NULL would be able to prove that assert false in the “else” branch is unreachable (and thus provable), which prevents it from being permissive. We describe a few such useful instances of angelic assertions in Sect. 3.1. We have implemented the angelic verification framework in a tool called AngelicVerifier for Boogie programs. Given a Boogie program with a set S of entrypoints, AngelicVerifier invokes each of the procedures in S with unknown input states. In the absence of any user-provided information, we assume that S is the set of all procedures in the program. Further, the library procedures are assigned a body that assigns a non-deterministic value to the return variables and adds an assume statement with a predicate unknown i (Fig. 2). This predicate will be used to constrain the return values of a procedure for all possible call sites (Sect. 4) within an entrypoint. AngelicVerifier invokes a given (demonic) verifier on this program with all entrypoints in S. If the verifier returns a trace that ends in an assertion failure, AngelicVerifier tries to infer an acceptable specification over the unknowns. If it succeeds, it installs the specification as a precondition of the entry point and Angelic Verification: Precise Verification Modulo Unknowns 327 function unknown 0(a: int ): bool; function unknown 1(a: int ): bool; procedure Lib1() returns (r: int) { assume unknown 0(r); return; } procedure Lib2() returns (r: int) { assume unknown 1(r); return; } Fig. 2. Modeling of external procedures by AngelicVerifier. All variables are non-deterministically initialized. // Trace: Bar → assert on line 6 SPEC :: x = NULL, Spec not permissive ANGELIC WARNING: Assertion x != NULL fails in proc Bar // Trace: Baz → assert on line 11 SPEC :: y = NULL // Trace: FooBar → assert on line 27 SPEC :: (∀ x 1: unknown 0(x 1) ⇒ x 1 = NULL) // Trace: FooBar → assert on line 30 SPEC :: (∀ x 2: unknown 1(x 2) ⇒ x 2 = NULL) // Trace: FooBar → assert on line 32 SPEC :: (∀ x 2, x 1: unknown 1(x 2) ∧ unknown 0(x 1)⇒ (x 2 = x 1 ∧ m[x 2] = NULL)) // Trace: Foo → Baz → assert on line 11 ANGELIC WARNING: Assertion y != NULL fails in proc Baz Fig. 3. Output of AngelicVerifier on the program shown in Fig. 1. A line with “SPEC” denotes an inferred specification to suppress a trace. iterates. If it is unable to infer an acceptable specification, the trace is reported as a defect to the user. Figure 3 shows the output of AngelicVerifier applied to our example: – For a trace that starts at Bar and fails the assert on line 6, we conjecture a specification x = NULL but discover that it is not permissive. The line with “ANGELIC WARNING” is a warning shown to the user. – For the trace that starts at Baz and fails the assert on line 11, we block the assertion failure by installing the constraint y = NULL. The code of Bar does not have any indication that it expects to see NULL as input. – For the three traces that start at FooBar and fail an assertion inside it, we block them using constraints on the return values of library calls. Notice that the return values are not in scope at the entry to FooBar; they get constrained indirectly using the unknown i predicates. The most interesting block is for the final assertion which involves assuming that (a) the returns from the two library calls are never aliased, and (b) the value of the array m at the value returned by Lib2 is non-null. (See Sect. 4) – The trace starting at Foo that calls Baz and fails on line 11 cannot be blocked (other than by using the non-permissive specification false), and is reported to the user. Contributions. In summary, the paper makes the following contributions: (a) We provide a framework for performing angelic verification with the goal of highlighting highest confidence bugs. (b) We provide a parametric framework based on Vocab and ˆA to control the level of angelism in the tool that a user can configure. (c) We describe a scalable algorithm for searching specifications using ExplainError (Sect. 4). We show an effective way to deal with internal non-determinism resulting from calls to library procedures. (d) We have implemented the ideas in a prototype tool AngelicVerifier and evaluated it on real-world benchmarks. We show that AngelicVerifier is competitive with industrial-strength tools even without access to the environment models. 328 A. Das et al. 2 Programming Language Syntax. We formalize the ideas in the paper in the context of a simple subset of the Boogie programming language [4]. A program consists of a set of basic blocks Block; each block consists of a label BlockId, a body s ∈ Stmt and a (possibly empty) set of successor blocks. A program has a designated first block Start ∈ Block. Most statements are standard; the havoc x statement assigns a non-deterministic value to the variable x. An expression (Expr) can be a variable identifier or an application of function f ∈ Functions. A formula (Formula) includes Boolean constants, application of a predicate p ∈ Predicates, and closed under Boolean connectives and quantifiers. The constructs are expressive enough to model features of most programming languages such as C [15] or Java [1]. Conditional statements are modeled using assume and goto statements; heap is modeled using interpreted array functions {read, write} ⊆ Functions [35] (Fig. 4). P ∈ Program ::= Block+ BL ∈ Block ::= BlockId : s; goto BlockId∗ s, t ∈ Stmt ::= skip | assert φ | assume φ | x := e | havoc x | s; s x, y ∈ Vars e ∈ Expr ::= x | f (e, . . . , e) φ, ψ ∈ Formula ::= true | false | p(e, . . . , e) | φ ∧ φ | ∀x : φ | ¬φ Fig. 4. A simple programming language. Semantics. A program state , is a type-consistent valuation of variables in scope in the program. The set of all states is denoted by Σ ∪ {Err}, where Err is a special state to indicate an assertion failure. For a given state ∈ Σ and an expression (or formula) e, e denotes the evaluation of e in the state. For a formula φ ∈ Formula, |= φ holds if φ evaluates to true. The semantics of a program is a set of execution traces, where a trace corresponds to a sequence of program states. We refer the readers to earlier works for details of the semantics [4]. Intuitively, an execution trace for a block BL corresponds to the sequence of states obtained by executing the body, and extending the terminating sequences with the traces of the successor blocks (if any). A sequence of states for a block does not terminate if it either executes an assume φ or an assert φ statement in a state ∈ Σ such that |= φ. In the latter case, the successor state is Err. The traces of a program is the set of traces for the start block Start. Let T (P) be the set of all traces of a program P. A program P is correct (denoted as |= P) if T (P) does not contain a trace that ends in the state Err. For a program P that is not correct, we define a failure trace as a trace τ that starts at Start and ends in the state Err. Angelic Verification: Precise Verification Modulo Unknowns 329 3 Angelic Verification In this section, we make the problem of angelic verification more concrete. We are given a program P that cannot be proved correct in the presence of unknowns from the environment (e.g. parameters, globals and outputs of library procedures). If one takes a conservative approach, we can only conclude that the program P has a possible assertion failure. In this setting, verification failures offer no information to a user of the tool. Instead, one can take a more pragmatic approach. If the user can characterize a class of acceptable missing specifications Φ that precludes verification (based on experience), one can instead ask a weaker verification question: does there exist a specification φ ∈ Φ such φ |= P?. One can characterize the acceptability of a specification φ along two axes: (i) Conciseness — the specification should have a concise representation in some vocabulary that the user expects and can inspect. This usually precludes specifications with several levels of Boolean connectives, quantifiers, or complex atomic expressions. (ii) Permissive — the specification φ should not be too strong to preclude feasible states of P that are known to exist. We allow two mechanisms for an expert user to control the set of acceptable specifications: – The user can provide a vocabulary Vocab of acceptable specifications, along with a checker that can test membership of a formula φ in Vocab. We show instances of Vocab in Sect. 4. – The user can augment P with a set of angelic assertions ˆA at specific locations, with the expectation that any specification should not prove an assertion assert e ∈ ˆA. We term the resulting verification problem angelic as the verifier co-operates with the user (as opposed to playing an adversary) to find specifications that can prove the program. This can be seen as a particular mechanism to allow an expert user to customize the abductive inference problem tailored to their needs [20]. If no such specification can found, it indicates that the verification failure of P cannot be categorized into previously known buckets of false alarms. We make these ideas more precise in the next few sections. In Sect. 3, we describe the notion of angelic correctness given P, Vocab and ˆA. In Sect. 3.2, we describe an algorithm to prove angelic correctness using existing program verifiers. 3.1 Problem Formulation Let φ ∈ Formula be a well-scoped formula at the block Start of a program P. We say that a program P is correct under φ (denoted as φ |= P), if the augmented program Start0 : assume φ ; goto Start with “Start” block as Start0 is correct. In other words, the program P is correct with a precondition φ. Let A be the set of assertions in program P. Additionally, let the user specify an additional set ˆA of angelic assertions at various blocks in P. We denote the program PA1,A2 as the instrumented version of P that has two sets of assertions enabled: 330 A. Das et al. – Normal assertions A1 ⊆ A that constitute a (possibly empty) subset of the original assertions present in P, and – Angelic assertions A2 ⊆ ˆA that constitute a (possibly empty) subset of set of additional user supplied assertions. Definition 1 (Permissive Precondition). For a program PA, ˆA and formula φ, Permissive(PA, ˆA, φ) holds if for every assertion s ∈ ˆA, if φ |= P∅,{s}, then true |= P∅,{s}. In other words, a specification φ is not allowed to prove any assertion s ∈ ˆA that was not provable under the unconstrained specification true. Definition 2 (Angelic Correctness). Given (i) a program P with a set of normal assertions A, (ii) an angelic set of assertions ˆA, and (iii) a vocabulary Vocab constraining a set of formulas at Start, P is angelically correct under (Vocab, ˆA) if there exists a formula φ ∈ Vocab such that: (i) φ |= PA,∅, and (ii) Permissive(P∅, ˆA, φ) holds. If no such specification φ exists, then we say that P has an angelic bug with respect to (Vocab, ˆA). In this case, we try to ensure the angelic correctness of P with respect to a subset of the assertions in P; the rest of the assertions are flagged as angelic warnings. Examples of Angelic Assertions ˆA. If one provides assert false at Start to be part of ˆA, it disallows preconditions that are inconsistent with other preconditions of the program [20]. If we add assert false at the end of every basic block, it prevents us from creating preconditions that create dead code in the program. This has the effect of detecting semantic inconsistency or doomed bugs [19,21,23,36]. Further, we can allow checking such assertions interprocedurally and at only a subset of locations (e.g. exclude defensive checks in callees). Finally, one can encode other domain knowledge using such assertions. For example, consider checking the correct lock usage for if(∗){L1 : assert ¬locked(l1); lock(l1); } else {L2 : assert locked(l2); unlock(l2); }. If the user expects an execution where l1 = l2 at L2, the angelic assertion assert l1 = l2 ∈ ˆA precludes the precondition ¬locked(l1) ∧ locked(l2), and reveals a warning for at least one of the two locations. As another example, if the user has observed a runtime value v for variable x at a program location l, she can add an assertion assert x = v ∈ ˆA at l to ensure that a specification does not preclude a known feasible behavior; further, the idea can be extended from feasible values to feasible intraprocedural path conditions. 3.2 Finding Angelic Bugs Algorithm 1 describes a (semi) algorithm for proving angelic correctness of a program. In addition to the program, it takes as inputs the set of angelic assertions ˆA, and a vocabulary Vocab. On termination, the procedure returns a specification E and a subset A1 ⊆ A for which the resultant program is angelically Angelic Verification: Precise Verification Modulo Unknowns 331 correct under E. Lines 1 and 2 initialize the variables E and A1, respectively. The loop from line 3 — 16 performs the main act of blocking failure traces in P. First, we verify the assertions A1 over P. The routine tries to establish E |= P using a sound and complete program verifier; the program verifier itself may never terminate. We return in line 6 if verification succeeds and P contains no failure traces (NO TRACE). In the event a failure trace τ is present, we query a procedure ExplainError (see Sect. 4) to find a specification φ that can prove that none of the executions along τ fail an assertion. Line 10 checks if the addition of the new constraint φ still ensures that the resulting specification E1 is permissive. If not, then it suppresses the assertion a that failed in τ (by removing it from A1) and outputs the trace τ to the user. Otherwise, it adds φ to the set of constraints collected so far. The loop repeats forever until verification succeeds in Line 4. The procedure may fail to terminate if either the call to Verify does not terminate, or the loop in Line 3 does not terminate due to an unbounded number of failure traces. Theorem 1. On termination, Algorithm 1 returns a pair of precondition E and a subset A1 ⊆ A such that (i) E |= P when only assertions in A1 are enabled, and (ii) Permissive(PA, ˆA, E). The proof follows directly from the check in line 4 that establishes (i), and line 10 that ensures permissiveness. 332 A. Das et al. 4 ExplainError Problem. Given a program P that is not correct, let τ be a failure trace of P. Since a trace can be represented as a valid program (Program) in our language (with a single block containing the sequence of statements ending in an assert statement), we will treat τ as a program with a single control flow path. Informally, the goal of ExplainError is to return a precondition φ from a given vocabulary Vocab such that φ |= τ, or false if no such precondition exists. ExplainError takes as input the following: (a) a program P, (b) a failure trace τ in P represented as a program and (c) a vocabulary Vocab that specifies syntactic restrictions on formulas to search over. It returns a formula φ such that φ |= τ and φ ∈ Vocab ∪ {false}. It returns false either when (a) the vocabulary does not contain any formula φ for which φ |= τ, or (b) the search does not terminate (say due to a timeout). Note that the weakest liberal precondition (wlp) of the trace [18] is guaranteed to be the weakest possible blocking constraint; however, it is usually very specific to the trace and may require enumerating all the concrete failing traces inside Algorithm 1. Moreover, the resulting formula for long traces are often not suitable for human consumption. When ExplainError returns a formula other than false, one may expect φ to be the weakest (most permissive) constraint in Vocab that blocks the failure path. However, this is not possible for several reasons (a) efficiency concerns preclude searching for the weakest, (b) Vocab may not be closed under disjunction and therefore the weakest constraint may not be defined. Thus the primary goals of ExplainError are to be (a) scalable (so that it can be invoked in the main loop in Algorithm 1), and (b) the resulting constraints are concise even if not the weakest over Vocab. Algorithm. Algorithm 2 provides the high-level flow of ExplainError. Currently, the algorithm is parameterized by Vocab that consists of two components: – Vocab.Atoms: a template for the set of atomic formulas that can appear in a blocking constraint. This can range over equalities (e1 = e2), difference constraints (e1 ≤ e2 + c), or some other syntactic pattern. – Vocab.Bool: the complexity of Boolean structure of the blocking constraint. One may choose to have a clausal formula ( i ei), cube formulas ( i ei), or an arbitrary conjunctive normal form (CNF) ( j( i ei)) over atomic formulas ei. Initially, we assume that we do not have internal non-determinism in the form of havoc or calls to external libraries in the trace τ – we will describe this extension later in this section. Let wlp(s, φ) be the weakest liberal precondition transformer for a s ∈ Stmt and φ ∈ Formula [18]. wlp(s, φ) is the weakest formula representing states from which executing s does not lead to assertion failure and on termination satisfies φ. It is defined as follows on the structure of statements: wlp(skip, φ) = φ, wlp(x := e, φ) = φ[e/x] (where φ[e/x] denotes substituting e for all free occurrences of x), wlp(assume ψ, φ) = ψ ⇒ φ, wlp(assert ψ, φ) = ψ ∧ φ, and Angelic Verification: Precise Verification Modulo Unknowns 333 wlp(s; t, φ) = wlp(s, wlp(t, φ)). Thus wlp(τ, true) will ensure that no assertion fails along τ. Our current algorithm (Algorithm 2) provides various options to create predicate (under) covers of wlp(τ, true) [22], formulas that imply wlp(τ, true). Such formulas are guaranteed to block the trace τ from failing. The first step ControlSlice performs an optimization to prune conditionals from τ that do not control dominate the failing assertion, by performing a variant of the path slicing approach [25]. Line 2 performs the wlp computation on the resulting trace τ1. At this point, φ1 is a Boolean combination of literals from arithmetic, equalities and array theories in satisfiability modulo theories (SMT) [34]. EliminateMapUpdates (in line 3) eliminates any occurrence of write from the formula using rewrite rules such as read(write(e1 , e2 , e3 ), e4 ) → e2 = e4 ? e3 : read(e1 , e4 ). This rule introduces new equality (aliasing) constraints in the resulting formula that are not present directly in τ. Line 4 chooses a set of atomic formulas from φ2 that match the vocabulary. Finally, the conditional in Line 5 determines the Boolean structure in the resulting expression. The MONOMIAL option specifies that the block expression is a disjunction of atoms from atoms1 . Line 7 collects the set of atoms in atoms1 that imply φ2, which in turn implies wlp(τ, true). We return the clause representing the disjunction of such atoms, which in turn implies wlp(τ, true). The more expensive ProjectAtoms(φ2, atoms1 ) returns a formula φ3 that is a CNF expression over atoms1 , such that φ3 ⇒ φ2, by performing Boolean quantifier elimination of the atoms not present in atoms1 . We first transform the formula φ2 into a conjunctive normal form (CNF) by repeatedly applying rewrite rules such as φ1 ∨(φ2 ∧φ3) → (φ1 ∨ φ2) ∧ (φ1 ∨ φ3). We employ a theorem prover at each step to try simplify intermediate expressions to true or false. Finally, for each clause c in the CNF form, we remove any literal in c that is not present in the set of atoms atoms1 . Example. Consider the example FooBar in Fig. 1, and the trace τ that corresponds to violation of assert w = NULL. The trace is a sequential composition of the following statements: z := x 1, m[z] := NULL, x := x 2, w := m[x], assert w = NULL, where we have replaced calls to Lib1 and Lib2 with x 1 and x 2 respectively. wlp(τ, true) is read(write(m, x 1, NULL), x 2) = NULL, which after applying EliminateMapUpdates would result in the expression (x 1 = x 2 ∧ m[x 2] = NULL). Notice that this is nearly identical to the blocking clause (except the quantifiers and triggers) returned while analyzing FooBar in Fig. 3. Let us allow any disequality e1 = e2 atoms in Vocab. If we only allow MONOMIAL Boolean structure, there does not exist any clause over these atoms (weaker than false) that suppresses the trace. Internal Non-determinism. In the presence of only input non-determinism (parameters and globals), the wlp(τ, true) is a well-scoped expression at entry in terms of parameters and globals. In the presence of internal non-determinism (due to havoc statements either present explicitly or implicitly for nondeterministic initialization of local variables), the target of a havoc is universally quantified away (wlp(havoc x, φ) = ∀u : φ[u/x]). However, this is unsatisfactory 334 A. Das et al. for several reasons: (a) one has to introduce a fresh quantified variable for different call sites of a function (say Lib1 in Fig. 1). (b) Moreover, the quantified formula does not have good trigger [17] to instantiate the universally quantified variables u. For a quantified formula, a trigger is a set of sub-expressions containing all the bound variables. To address both these issues, we introduce a distinct predicate unknown i after the i-th syntactic call to havoc and introduce an assume statement after the havoc (Fig. 2): assume unknown i(x), The wlp rules for assume and havoc ensure that the quantifiers are more well-behaved as the resultant formulas have unknown i(x) as a trigger (see Fig. 3). 5 Evaluation We have implemented the ideas described in this paper (Algorithms 1 and 2) in a tool called AngelicVerifier, available with sources.1 AngelicVerifier uses the Corral verifier [31] as a black box to implement the check Verify used in Algorithm 1. Corral performs interprocedural analysis of programs written in the Boogie language; the Boogie program can be generated from either C [15], .NET [5] or Java programs [1]. As an optimization, while running ExplainError, AngelicVerifier first tries the MONOMIAL option and falls back to ProjectAtoms when the former returns false. We empirically evaluate AngelicVerifier against two industrial tools: the Static Driver Verifier (SDV) [3] and PREfix [9]. Each of these tools come packaged with models of the environment (both harness and stubs) of the programs they target. These models have been designed over several years of testing and tuning by a product team. We ran AngelicVerifier with none of these models and compared the number of code defects found as well as the benefit of treating the missing environment as angelic over treating it as demonic. 5.1 Comparison with SDV Benchmarks Procedures KLOC CPU(Ks) Correct (5) 71-235 2.0-19.1 1.1 Buggy (13) 23-139 1.5-6.7 1.7 Fig. 5. SDV Benchmarks SDV is a tool offered by Microsoft to thirdparty driver developers. It checks for typestate properties (e.g., locks are acquired and released in strict alternation) on Windows device drivers. SDV checks these properties by introducing monitors in the program in the form of global variables, and instrumenting the property as assertions in the program. We chose a subset of benchmarks and properties from SDV’s verification suite that correspond to drivers distributed in the Windows Driver Kit (WDK); their characteristics are mentioned in Fig. 5. We picked a total of 18 driver-property pairs, in which SDV reports a defect on 13 of them. Figure 5 shows the range for the number of procedures, lines of code (contained in C files) and the total time taken by SDV (in 1000s of seconds) on all of the buggy or correct instances. 1 At http://corral.codeplex.com, project AddOns\AngelicVerifierNull. Angelic Verification: Precise Verification Modulo Unknowns 335 We ran various instantiations of AngelicVerifier on the SDV benchmarks: – default: The vocabulary includes aliasing constraints (e1 = e2) as well as arbitrary expressions over monitor variables. – noTS: The vocabulary only includes aliasing constraints. – noAlias: The vocabulary only includes expressions over the monitor vari- ables. – noEE: The vocabulary is empty. In this case, all traces returned by Corral are treated as bugs without running ExplainError. This option simulates a demonic environment. – default+harness: This is the same as default, but the input program includes a stripped version of the harness used by SDV. This harness initializes the monitor variables and calls specific procedures in the driver. (The actual harness used by SDV is several times bigger and includes initializations of various data structures and flags as well.) Example: Fig. 6 contains code snippets inspired from real code in our benchmarks. We use it to highlight the differences between the various configurations of AngelicVerifier described above. – The assertion in Fig. 6(a) will be reported as a bug by noTS but not default because LockDepth > 1 is not a valid atom for noTS. – The assertion in Fig. 6(c) will be reported as a bug by noAlias but not default because it requires a specification that constrains aliasing in the environment. For instance, default constrains the environment by imposing (x = irp ∧ y = irp) ∨ (z = irp ∧ y = irp), where x is devobj → DeviceExtension → FlushIrp, y is devobj → DeviceExtension → LockIrp and z is devobj → DeviceExtension → BlockIrp. – The procedures called Harness in Fig. 6 are only available under the setting default+harness. The assertion in Fig. 6(a) will not be reported by default as it is always possible (irrespective of the number of calls to KeAcquireSpinLock and KeReleaseSpinLock) to construct an initial value of LockDepth that suppresses the assertion. When the (stripped) harness is present, this assertion will be reported. Note that the assertion failure in Fig. 6(b) will be caught by both default and default+harness. The results on SDV benchmarks are summarized in Table 1. For each AngelicVerifier configuration, we report the cumulative running time in thousands of seconds (CPU), the numbers of bugs reported (B), and the number of false positives (FP) and false negatives (FN). The experiments were run (sequentially, single-threaded) on a server class machine with two Intel(R) Xeon(R) processors (16 logical cores) executing at 2.4 GHz with 32 GB RAM. noEE reports a large number of false positives, confirming that a demonic environment leads to spurious warnings. The default configuration, on the other hand, reports no false positives! It is overly-optimistic in some cases resulting in missed defects. It is clear that the out-of-the-box experience, i.e., before environment models have been written, of AngelicVerifier (low false positives, 336 A. Das et al. // monitor variable int LockDepth; // This procedure is only // available under the option // default +harness void Harness() { LockDepth = 0; IoCancelSpinLock(); } void IoCancelSpinLock() { KeReleaseSpinLock(); ... KeReleaseSpinLock(); ... KeAcquireSpinLock(); ... KeCheckSpinLock(); } void KeAcquireSpinLock() { LockDepth ++; } void KeReleaseSpinLock() { LockDepth −−; } void KeCheckSpinLock() { assert LockDepth > 0; } const int PASSIVE = 0; const int DISPATCH = 2; // monitor variable int irqlVal ; // This procedure is only // available under the option // default +harness void Harness() { irqlVal = PASSIVE; KeRaiseIrql (); } void KeRaiseIrql () { ... irqlVal = DISPATCH; ... KeReleaseIrql (); } void KeReleaseIrql () { assert irqlVal == PASSIVE; irqlVal = DISPATCH; } int completed; IRP ∗ global irp ; void DispatchRoutine(DO ∗devobj, IRP ∗irp) { completed = 0; global irp = irp ; DE ∗de = devobj→DeviceExtension; ... IoCompleteRequest(de→FlushIrp); ... IoCompleteRequest(de→BlockIrp); ... IoCompleteRequest(de→LockIrp); } void IoCompleteRequest(IRP ∗p) { if (p == global irp ) { assert completed ! = 1; completed = 1; } } (a) (b) (c) Fig. 6. Code snippets, in C, illustrating the various settings of AngelicVerifier Table 1. Results on SDV benchmarks default default+harness noEE noTS noAlias Bench CPU B FP FN CPU B FP FN CPU B FP FN CPU B FP FN CPU B FP FN (Ks) (Ks) (Ks) (Ks) (Ks) Correct 9.97 0 0 0 16.8 0 0 0 0.28 12 12 0 4.20 2 2 0 15.1 0 0 0 Buggy 3.19 9 0 4 3.52 13 0 0 0.47 21 13 5 2.58 14 3 2 1.42 10 3 6 few false negatives) is far superior to a demonic verifier (very high false positives, few false negatives). The default+harness configuration shows that once the tool could use the (stripped) harness, it found all bugs reported by SDV. The configurations noTS and noAlias show that the individual components of the vocabulary were necessary for inferring the right environment specification in the default configuration. We also note that the running time of our tool is several times higher than that of SDV; instead of the tedious manual environment modeling effort, the cost shifts to higher running time of the automated verifier. 5.2 Comparison Against PREfix PREfix is a production tool used internally within Microsoft. It checks for several kinds of programming errors, including checking for null-pointer dereferences, Angelic Verification: Precise Verification Modulo Unknowns 337 Table 2. Comparison against PREfix on checking for null-pointer dereferences stats PREfix default default-AA Bench Procs KLOC B CPU(Ks) B PM FP FN PRE-FP PRE-FN CPU(Ks) B Mod 1 453 37.2 14 2.7 26 14 4 0 0 1 1.8 26 Mod 2 64 6.5 3 0.2 0 0 0 3 0 0 0.2 0 Mod 3 479 56.6 5 5.8 11 3 4 2 0 1 1.7 6 Mod 4 382 37.8 4 1.8 3 0 0 0 4 3 1.1 2 Mod 5 284 30.9 6 0.8 12 6 1 0 0 0 0.4 11 Mod 6 37 8.4 7 0.1 10 7 0 0 0 0 0.1 10 Mod 7 184 20.9 10 0.6 11 10 0 0 0 1 0.4 11 Mod 8 400 43.8 5 2.9 15 5 1 0 0 1 1.0 15 Mod 9 40 3.2 7 0.1 8 7 0 0 0 0 0.1 8 Mod 10 998 76.5 7 24.9 8 3 1 4 0 4 16.0 4 total – 321 68 39.9 104 54 11 9 4 11 22.8 93 on the Windows code base. We targeted AngelicVerifier to find null-pointer exceptions and compared against PREfix on 10 modules selected randomly, such that PREfix reported at least one defect in the module. Table 2 reports the sizes of these modules. (The names are hidden for proprietary reasons.) We used two AngelicVerifier configurations: default-AA uses a vocabulary of only aliasing constraints. default uses the same vocabulary along with angelic assertions: an assert false is injected after any statement of the form assume e == null. This enforces that if the programmer anticipated an expression being null at some point in the program, then AngelicVerifier should not impose an environment specification that makes this check redundant. Scalability. This set of benchmarks were several times harder than the SDV benchmarks for our tool chain. This is because of the larger codebase, but also because checking null-ness requires tracking of pointers in the heap, whereas SDV’s type-state properties are mostly control-flow based and require minimal tracking of pointers. To address the scale, we use two standard tricks. First, we use a cheap alias analysis to prove many of the dereferences safe and only focus AngelicVerifier on the rest. Second, AngelicVerifier explores different entrypoints of the program in parallel. We used the same machine as for the previous experiment, and limited parallelism to 16 threads (one per available core). Further, we optimized ExplainError to avoid looking at assume statements along the trace, i.e., it can only block the failing assertion. This can result in ExplainError returning a stronger-than-necessary condition but improves the convergence time of AngelicVerifier. This is a limitation that we are planning to address in future work. Table 2 shows the comparison between PREfix and AngelicVerifier. In each case, the number of bug reports is indicated as B and the running time as CPU (in thousands of seconds). We found AngelicVerifier to be more verbose than PREfix, producing a higher number of reports (104 to 68). However, this was mostly because AngelicVerifier reported multiple failures with the same cause. For instance, x = null; if(...){∗x = ...}else{∗x = ...} would be flagged as two buggy 338 A. Das et al. traces by AngelicVerifier but only one by PREfix. Thus, there is potential for post-processing AngelicVerifier’s output, but this is orthogonal to the goals of this paper. We report the number of PREfix traces matched by some trace of AngelicVerifier as PM. To save effort, we consider all such traces as true positives. We manually examined the rest of the traces. We classified traces reported by AngelicVerifier but not by PREfix as either false positives of AngelicVerifier (FP) or as false negatives of PREfix (PRE-FN). The columns FN and PREFP are the duals, for traces reported by PREfix but not by AngelicVerifier. PREfix is not a desktop application; one can only invoke it as a background service that runs on a dedicated cluster. Consequently, we do not have the running times of PREfix. AngelicVerifier takes 11 hours to consume all benchmarks, totaling 321 KLOC, which is very reasonable (for, say, overnight testing on a single machine). Most importantly, AngelicVerifier is able to find most (80 %) of the bugs caught by PREfix, without any environment modeling! We verified that under a demonic environment, the Corral verifier reports 396 traces, most of which are false positives. AngelicVerifier has 11 false positives; 5 of these are due to missing stubs (e.g., a call to the KeBugCheck routine does not return, but AngelicVerifier, in the absence of its implementation, does not consider this to be a valid specification). All of these 5 were suppressed when we added a model of the missing stubs. The other 6 reports turn out to be a bug in our compiler front-end, where it produced the wrong IR for certain features of C. (Thus, they are not issues with AngelicVerifier.) AngelicVerifier has 9 false negatives. Out of these, 1 is due to a missing stub (where it was valid for it to return a null pointer), 4 due to Corral timing out, and 5 due to our front-end issues. Interestingly, PREfix misses 11 valid defects that AngelicVerifier reports. Out of these, 6 are reported by AngelicVerifier because it finds an inconsistency with an angelic assertion; we believe PREfix does not look for inconsistencies. We are unsure of the reason why PREfix misses the other 5. We have reported these new defects to the product teams and are awaiting a reply. We also found 4 false positives in PREfix’s results (due to infeasible path conditions). A comparison between default and default-AA reveals that 11 traces were found because of an inconsistency with an angelic assertion. We have already mentioned that 6 of these are valid defects. The other 5 are again due to front-end issues. In summary, AngelicVerifier matched 80 % of PREfix’s reports, found new defects, and reported very few false positives. 6 Related Work Our work is closely related to previous work on abductive reasoning [7,10,11,20] in program verification. Dillig et al. [20] perform abductive reasoning based Angelic Verification: Precise Verification Modulo Unknowns 339 on quantifier elimination of variables in wlp that do not appear in the minimum satisfying assignment of ¬wlp. The method requires quantifier elimination that is difficult in the presence of richer theories such as quantifiers and uninterpreted functions. Our method ProjectAtoms can be seen as a (lightweight) method for performing Boolean quantifier elimination (without interpreting the theory predicates) that we have found to be effective in practice. It can be shown that the specifications obtained by the two methods can be incomparable, even for arithmetic programs. Calcagno et al. use bi-abductive reasoning to perform bottom-up shape analysis [10] of programs, but performed only in the context of intraprocedural reasoning. In comparison of this work, we provide configurability by being able to control parts of vocabulary and the check for permissiveness using ˆA. The work on almost-correct specifications [7] provides a method for minimally weakening the wlp over a set of predicates to construct specifications that disallow dead code. However, the method is expensive and can be only applied intraprocedurally. Several program verification techniques have been proposed to detect semantic inconsistency bugs [21] in recent years [19,23,36]. Our work can be instantiated to detect this class of bugs (even interprocedurally); however, it may not be the most scalable approach to perform the checks. The work on angelic nondeterminism [8] allows for checking if the non-deterministic operations can be replaced with deterministic code to succeed the assertions. Although similar in principle, our end goal is bug finding with high confidence, as opposed to program synthesis. The work on angelic debugging [12] and BugAssist [26] similarly look for relevant expressions to relax to fix a failing test case. The difference is that the focus is more on debugging failing test cases and repairing a program. The work on ranking static analysis warnings using statistical measures is orthogonal and perhaps complementary to our technique [28]. Since these techniques do not exploit program semantics, such techniques can only be used as a post-processing step (thus offering little control to users of a tool). Finally, work on differential static analysis [2] can be leveraged to suppress a class of warnings with respect to another program that can serve as a specification [29,32]. Our work does not require any additional program as a specification and therefore can be more readily applied to standard verification tasks. The work on CBUGS [27] leverages sequential interleavings as a specification while checking concurrent programs. 7 Conclusions We presented the angelic verification framework that constrains a verifier to search for warnings that cannot be precluded with acceptable specifications over unknowns from the environment. Our framework is parameterized to allow a user to choose different instantiations to fit the precision-recall tradeoff. Preliminary experiments indicate that such a tool can indeed be competitive with industrial tools, even without any modeling effort. With subsequent modeling (e.g. adding a harness), the same tool can find more interesting warnings. 340 A. Das et al. References 1. Arlt, S., Sch¨af, M.: Joogie: infeasible code detection for java. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 767–773. Springer, Heidelberg (2012) 2. Lahiri, S.K., Vaswani, K., Hoare, C.A.R.: Differential static analysis: opportunities, applications, and challenges. In: Proceedings of the Workshop on Future of Software Engineering Research, FoSER 2010, at the 18th ACM SIGSOFT, International Symposium on Foundations of Software Engineering, November 7-11, 2010, pp. 201–2014, Santa Fe, NM, USA (2010) 3. Ball, T., Levin, V., Rajamani, S.K.: A decade of software model checking with SLAM. Commun. ACM 54(7), 68–76 (2011) 4. Barnett, M., Leino, K.R.M.: Weakest-precondition of unstructured programs. In: Program Analysis For Software Tools and Engineering (PASTE 2005), pp. 82–87 (2005) 5. Barnett, M., Qadeer, S.: BCT: a translator from MSIL to Boogie. In: Seventh Workshop on Bytecode Semantics, Verification, Analysis and Transformation (2012) 6. Bessey, A., Block, K., Chelf, B., Chou, A., Fulton, B., Hallem, S., Henri-Gros, C., Kamsky, A., McPeak, S., Engler, D.: A few billion lines of code later: using static analysis to find bugs in the real world. Commun. ACM 53(2), 66–75 (2010) 7. Blackshear, S., Lahiri, S.K.: Almost-correct specifications: a modular semantic framework for assigning confidence to warnings. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2013, pp. 209–218, Seattle, WA, USA, 16–19 Jun 2013 8. Bod´ık, R., Chandra, S., Galenson, J., Kimelman, D., Tung, N., Barman, S., Rodarmor, C.: Programming with angelic nondeterminism. In: Principles of Programming Languages (POPL 2010), pp. 339–352 (2010) 9. Bush, W.R., Pincus, J.D., Sielaff, D.J.: A static analyzer for finding dynamic programming errors. Softw. Pract. Exper. 30(7), 775–802 (2000) 10. Calcagno, C., Distefano, D., O’Hearn, P.W., Yang, H.: Compositional shape analysis by means of bi-abduction. In: Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2009, pp. 289–300, Savannah, GA, USA, 21–23 Jan 2009 11. Chandra, S., Fink, S.J., Sridharan, M.: Snugglebug: a powerful approach to weakest preconditions. In: Programming Language Design and Implementation (PLDI 2009), pp. 363–374 (2009) 12. Chandra, S., Torlak, E., Barman, S., Bodik, R.: Angelic debugging. In: Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, pp. 121– 130. ACM, New York, NY, USA (2011) 13. Clarke, E.M., Grumberg, O., Peled, D.A.: Model Checking. MIT Press, Cambridge (2000) 14. Clarke, E.M., Kroening, D., Yorav, K.: Behavioral consistency of C and verilog programs using bounded model checking. In: Proceedings of the 40th Design Automation Conference, DAC 2003, pp. 368–371, Anaheim, CA, USA, 2–6 Jun 2003 15. Condit, J., Hackett, B., Lahiri, S.K., Qadeer, S.: Unifying type checking and property checking for low-level code. In: Principles of Programming Languages (POPL 2009), pp. 302–314 (2009) 16. Cousot, P., Cousot, R.: Abstract interpretation : a unified lattice model for the static analysis of programs by construction or approximation of fixpoints. In: Symposium on Principles of Programming Languages (POPL 1977), ACM Press (1977) Angelic Verification: Precise Verification Modulo Unknowns 341 17. Detlefs, D., Nelson, G., Saxe, J.B.: Simplify: a theorem prover for program checking. J. ACM 52(3), 365–473 (2005) 18. Dijkstra, E.W.: Guarded commands, nondeterminacy and formal derivation of programs. Commun. ACM 18(8), 453–457 (1975) 19. Dillig, I., Dillig, T., Aiken, A.: Static error detection using semantic inconsistency inference. In: Programming Language Design and Implementation (PLDI 2007), pp. 435–445 (2007) 20. Dillig, I., Dillig, T., Aiken, A.: Automated error diagnosis using abductive inference. In: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2012, pp. 181–192. ACM, New York, NY, USA, (2012) 21. Engler, D.R., Chen, D.Y., Chou, A.: Bugs as inconsistent behavior: a general approach to inferring errors in systems code. In: Symposium on Operating Systems Principles (SOSP 2001), pp. 57–72 (2001) 22. Graf, S., Sa¨ıdi, H.: Construction of abstract state graphs with PVS. In: Grumberg, O. (ed.) CAV 1997. LNCS, vol. 1254, pp. 72–83. Springer, Heidelberg (1997) 23. Hoenicke, J., Leino, K.R.M., Podelski, A., Sch¨af, M., Wies, T.: Doomed program points. Form. Meth. Syst. Des. 37(2–3), 171–199 (2010) 24. Ivanˇci´c, F., Yang, Z., Ganai, M.K., Gupta, A., Shlyakhter, I., Ashar, P.: F-Soft: software verification platform. In: Etessami, K., Rajamani, S.K. (eds.) CAV 2005. LNCS, vol. 3576, pp. 301–306. Springer, Heidelberg (2005) 25. Jhala, R., Majumdar, R.: Path slicing. In: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pp. 38– 47, Chicago, IL, USA, 12–15 Jun 2005 26. Jose, M., Majumdar, R.: Cause clue clauses: error localization using maximum satisfiability. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, pp. 437–446, San Jose, CA, USA, 4–8 Jun 2011 27. Joshi, S., Lahiri, S.K., Lal, A.: Underspecified harnesses and interleaved bugs. In: Principles of Programming Languages (POPL 2012), pp. 19–30, ACM (2012) 28. Kremenek, T., Engler, D.R.: Z-ranking: using statistical analysis to counter the impact of static analysis approximations. In: Cousot, R. (ed.) SAS 2003. LNCS, vol. 2694, pp. 295–315. Springer, Heidelberg (2003) 29. Lahiri, S.K., McMillan, K.L., Sharma, R., Hawblitzel, C.: Differential assertion checking. In: Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2013, pp. 345–355, Saint Petersburg, Russian Federation, 18–26 Aug 2013 30. Lahiri, S.K., Qadeer, S., Galeotti, J.P., Voung, J.W., Wies, T.: Intra-module inference. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 493–508. Springer, Heidelberg (2009) 31. Lal, A., Qadeer, S., Lahiri, S.K.: A solver for reachability modulo theories. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 427–443. Springer, Heidelberg (2012) 32. Logozzo, F., Lahiri, S.K., F¨ahndrich, M., Blackshear, S.: Verification modulo versions: towards usable verification. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2014, p. 32, Edinburgh, United Kingdom, 09–11 Jun 2014 33. McMillan, K.L.: An interpolating theorem prover. In: Jensen, K., Podelski, A. (eds.) TACAS 2004. LNCS, vol. 2988, pp. 16–30. Springer, Heidelberg (2004) 342 A. Das et al. 34. Satisfiability modulo theories library (SMT-LIB). http://goedel.cs.uiowa.edu/ smtlib/ 35. Stump, A., Barrett, C.W., Dill, D.L., Levitt, J.R.: A decision procedure for an extensional theory of arrays. In: IEEE Symposium of Logic in Computer Science (LICS 2001) (2001) 36. Tomb, A., Flanagan, C.: Detecting inconsistencies via universal reachability analysis. In: International Symposium on Software Testing and Analysis (ISSTA 2012) (2012)