Verifying Properties of Well-Founded Linked Lists
Shuvendu K. Lahiri
Microsoft Research
shuvendu@microsoft.com
Shaz Qadeer
Microsoft Research
qadeer@microsoft.com
Abstract
We describe a novel method for verifying programs that manipulate
linked lists, based on two new predicates that characterize reachability
of heap cells. These predicates allow reasoning about both
acyclic and cyclic lists uniformly with equal ease. The crucial insight
behind our approach is that a circular list invariably contains
a distinguished head cell that provides a handle on the list. This observation
suggests a programming methodology that requires the
heap of the program at each step to be well-founded, i.e., for any
field f in the program, every sequence u.f, u.f.f, . . . contains at
least one head cell. We believe that our methodology captures the
most common idiom of programming with linked data structures.
We enforce our methodology by automatically instrumenting the
program with updates to two auxiliary variables representing these
predicates and adding assertions in terms of these auxiliary vari-
ables.
To prove program properties and the instrumented assertions,
we provide a first-order axiomatization of our two predicates. We
also introduce a novel induction principle made possible by the
well-foundedness of the heap. We use our induction principle to
derive from two basic axioms a small set of additional first-order
axioms that are useful for proving the correctness of several pro-
grams.
We have implemented our method in a tool and used it to verify
the correctness of a variety of nontrivial programs manipulating
both acyclic and cyclic singly-linked lists and doubly-linked lists.
We also demonstrate the use of indexed predicate abstraction to
automatically synthesize loop invariants for these examples.
Categories and Subject Descriptors: D.2.4: Software/program
verification
General Terms: Reliability, security, languages, verification
Keywords: First-order axiomatization, well-founded linked lists,
decision procedure, automated theorem proving, heap abstraction
1. Introduction
Program verification has made enormous progress in the last few
decades. A significant contribution to this progress has been made
by reasoning techniques based on first-order logic. For a number
of programs, the verification condition whose validity establishes
the correctness of the program is expressible in a combination of
first order theories such as arithmetic, uninterpreted functions, and
propositional logic. The pioneering work of Nelson and Oppen [30]
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
POPL'06 January 11­13, 2006, Charleston, South Carolina, USA.
Copyright c 2006 ACM 1-59593-027-2/06/0001. . . $5.00.
described a method for combining decision procedures for individual
theories to yield a decision procedure for the combined theory.
Based on their results, a number of powerful automated theorem
provers [9, 4] have been developed for program verification. Many
successful tools [12, 3, 17] have used these provers for proving program
properties.
However, first-order reasoning typically breaks down when
we want to prove properties of programs that manipulate heapallocated
linked data structures. The main problem with reasoning
about a data structure, such as a linked list, is that it is impossible
to express an invariant about all members of a list in first-order
logic. To achieve such a specification in general requires the use of
the reachability predicate which cannot be expressed in first-order
logic. Consequently, researchers have investigated richer logics
such as combination of first order logic with transitive closure [18]
and monadic second-order logic [28]. These approaches are typically
unable to harness the advances made in automated theorem
proving based on first-order logic.
In this paper, we develop a method for verifying linked data
structures based entirely on first-order logic. The idea behind our
approach is to provide a first-order approximation of the reachability
predicate. Nelson [32] and Lev-Ami et al. [24] have also
proposed first-order axiomatizations for linked lists. Our work improves
upon these previous approaches in several significant ways.
Most notably, our work makes the analysis of cyclic linked lists as
uniform as acyclic lists. While theoretically incomplete, we believe
that our approach is complete enough for most realistic programs.
We have validated this belief by building a tool to mechanize our
method. We have used our tool to verify a number of programs
manipulating singly-linked and doubly-linked lists.
In the next section, we illustrate the benefits of our analysis for
reasoning uniformly about acyclic and cyclic lists.
1.1 Motivation
The novelty of our technique is best illustrated by an example. Consider
the following function acyclic simple that iterates over an
acyclic list pointed to by the variable head and sets the data field
of each list element to 0. This example is representative of a variety
of programs that use linked lists as sets and iterate over them.
//@ requires head != null
void acyclic simple() {
iter = head;
while (iter != null) {
iter.data = 0;
iter = iter.next;
}
}
To write the loop invariant for this program, we require the
reachability function Rnext, which maps each cell u to the set containing
u and all cells reachable from u by following the next field
115
and excluding null. The cornerstone of all previous approaches to
first-order reasoning about linked lists is the following axiom about
Rnext:
v  Rnext(u)  (v = u  (u.next = null  v  Rnext(u.next)))
This axiom and the rest of this section assumes, for simplicity but
without loss of generality, that null.next = null. We have also
added the precondition head != null to make this example similar
in as many ways as the example of a cyclic list traversal described
later in the section.
Using Rnext, the loop invariant for the program can be expressed
as follows:
v  Rnext(head) : (iter = nullv  Rnext(iter))v.data = 0
This loop invariant proves the following desired postcondition of
the function.
v  Rnext(head) : v.data = 0
The verification condition for this program is easily constructed using
the weakest-precondition transformer [10], and the verification
condition can be proved from the axiom using purely first-order
reasoning based on uninterpreted functions and quantifier instantiation.
We encourage the reader to perform this simple but interesting
calculation.
Unfortunately, this simple approach breaks down with cyclic
lists which are quite common, and perhaps the most common linked
data structure used in operating system kernels. To illustrate the
problem, consider the following variant cyclic simple of the first
function that iterates over a cyclic list pointed to by head and sets
the data fields of the list elements to 0. The complication, in this
case, is that the next field of the last element of the list points to
head rather than null.
//@ requires head != null
void cyclic simple() {
head.data = 0;
iter = head.next;
while(iter != head) {
iter.data = 0;
iter = iter.next;
}
}
The circularity of the list results in a breakdown of the reasoning
described earlier. The value of Rnext(head) and Rnext(iter)
remains constant and equal to the set of all elements in the list during
the entire execution of the loop. As a result, we are unable to
write a simple loop invariant using Rnext. Even proving the absence
of null dereferences in cylic simple is nontrivial compared to
acyclic simple where it is trivial because of the loop entry condition.
In the remainder of this section, we show how our approach
makes verification of programs manipulating cyclic lists as simple
as those manipulating acyclic lists.
The crucial insight behind our approach is that a circular list
invariably contains a distinguished head cell. This head cell provides
a handle on the list, usually marks the beginning of the list,
and the last element of the list points to it. In the program above,
this cell is pointed to by the program variable head. Moreover,
it is the presence of this cell that ensures the termination of iterations
over circular linked lists. Let Hnext denote the set of head
cells of a program. Then, the above program has the precondition
{head, null}  Hnext. We think of null as a special head and consequently
require that Hnext always contain null. The usefulness
of constraining Hnext in this fashion will become clear later in this
section.
We introduce a new axiomatization of linked lists using functions
Rnext and Bnext. In contrast to its definition earlier, the new
definition of function Rnext maps each cell u to the set of cells
containing u and all cells reachable from u by following the next
field until a cell v  Hnext is reached. The cell v is not included in
Rnext(u) unless u = v. Finally, we define Bnext(u) to be v. The cell
v acts as a block to the traversal of the next field from u and its
identity is captured as Bnext(u). We call Bnext(u) the blocking cell
for u. The axioms for Rnext and Bnext are as follows:
v  Rnext(u)  (u = v  (u.next  Hnext  v  Rnext(u.next)))
Bnext(u) = (u.next  Hnext) ? u.next : Bnext(u.next)
The first axiom is similar in spirit to the axiom for Rnext described
earlier. The second axiom says that Bnext(u) is u.next if u.next 
Hnext and Bnext(u.next) otherwise.
Using the definition of Bnext, we can specify that head points to
a circular list by adding the following precondition to the function
cyclic simple.
//@ requires Bnext(head) = head
From the definition of Rnext and Bnext, it is clear that if Bnext(u) =
u, then u  Hnext and Rnext(u) is the set of cells in a cycle in which
no cell other than u is in Hnext. Therefore, the set of cells in the
circular list pointed to by head is given by Rnext(head). Using the
definition of Rnext and Bnext, we can now write the loop invariant
for the loop in cyclic simple as follows.
v  Rnext(head) :
(iter = head  v  Rnext(iter))  v.data = 0
 Bnext(iter) = head
Observe that the first conjunct of the loop invariant is similar in
structure to the loop invariant for the function acyclic simple.
The two new axioms allow us to prove the correctness of the
loop invariant, the absence of null dereferences, and the following
desired postcondition for cyclic simple:
v  Rnext(head) : v.data = 0
As mentioned earlier, proving the absence of null dereferences in
cyclic simple is nontrivial and requires the use of the axiom
about Bnext.
Our new axiomatization is a generalization of the first axiomatization
that worked only for acyclic lists. If Hnext = {null}, the new
axiom about Rnext reduces to the first axiom about Rnext. Indeed, we
can verify the first program using the new axiomatization by adding
the precondition Hnext = {null} to the function acyclic simple.
Thus, our new axiom system can verify programs manipulating
acyclic as well as cyclic lists with equal ease.
1.2 Contributions
The main technical contribution of this paper is a novel method for
verifying linked lists based on two new predicates that characterize
reachability of heap cells. These predicates allow reasoning about
both acyclic and cyclic lists uniformly with equal ease. The crucial
insight behind our approach is that a circular list invariably contains
a distinguished head cell that provides a handle on the list. This
observation suggests a programming methodology that requires the
heap of the program at each step to be well-founded, i.e., for any
field f in the program, every sequence u.f, u.f.f, . . . contains at
least one head cell. The set of head cells is identified by a new
variable added to the program by the programmer.
We believe that our methodology captures the most common
idiom of programming with linked data structures. We enforce our
methodology by automatically instrumenting the program with updates
to two auxiliary variables representing these predicates and
adding assertions in terms of these auxiliary variables. Our instrumentation
captures well-foundedness precisely -- the instrumented
program fails one of these assertions if and only if the original program
reaches a state containing a heap that is not well-founded.
116
//@ requires a != null && b != null
//@ requires {null, a, b}  Hnext
//@ requires Bnext(a) == a && Bnext(b) == b
//@ requires Rnext(a)  Rnext(b) == {}
//@ requires u  Rnext(b) : u.r == b.r
//@ ensures Bnext(a) == b && Bnext(b) == a
//@ ensures u  (Rnext(a)  Rnext(b)) : u.r == b.r
//@ ensures
(Rnext(a)  Rnext(b)) == (old(Rnext)(a)  old(Rnext)(b))
//@ ensures u  old(Rnext)(a) : u.r == old(u.r)
void union(Cell a, Cell b) {
a.r = b.r;
Cell curr = a.next;
while (curr != a) {
curr.r = b.r;
curr = curr.next;
}
Cell tmp = a.next;
a.next = b.next;
b.next = tmp;
}
Figure 1. Performing the union of two circular lists
To prove program properties and the instrumented assertions,
we provide a first-order axiomatization of our two predicates. Our
axiomatization consists of the two basic first-order axioms introduced
in Section 1.1. We also introduce a novel induction principle
made possible by the well-foundedness of the heap. We use our induction
principle to derive from the two basic axioms a small set of
additional first-order axioms that are useful for proving the correctness
of several programs. All of these derived axioms are intuitive
and natural and we state them precisely in Section 5.
We also explore an alternative verification approach based on
a decision procedure rather than a first-order axiomatization. In
Section 6, we show that the quantifier-free fragment of the theory
of Rnext and Bnext is NP-complete and propose a decision procedure
for this fragment based on a small-model encoding.
We have implemented our method in a tool and used it to verify
the correctness of a variety of nontrivial programs manipulating
both acyclic and cyclic singly-linked lists and doubly-linked lists.
We also demonstrate the use of indexed predicate abstraction [13,
21] for automatically synthesizing the loop invariants for these
examples. We describe our preliminary experience with this tool
in Section 7. For lack of space, we have omitted the proofs of the
various theorems and lemmas in the paper; they can be found in the
full version of the paper [23].
2. Examples
In Section 1.1, we illustrated our method by verifying the functions
acyclic simple and cyclic simple. These functions were interesting
and illustrative but comparatively simple. In this section,
we illustrate our method on three more complex examples. The
first example in Section 2.1 modifies the link structure of the heap;
the second example in Section 2.2 uses arithmetic in addition to
heap manipulation; the third example in Section 2.3 manipulates
a doubly-linked list. We note that the examples in Section 1.1 and
all examples of this section have been verified by the tool whose
details we provide in Section 7.
2.1 Set-union
Our first example is the function union shown in Figure 1. This
example is taken from a paper by Nelson [32]. The function union
takes two circular linked lists, a and b, as arguments. Each list
represents a set; the field r in a cell contains the identifier of the
unique set to which it belongs.
The function union has a number of preconditions also stated
in Figure 1. The second precondition says that both a and b are
pointing to head cells. The third precondition says that a and b are
pointing to circular lists. The fourth precondition says that the lists
pointed to by a and b are disjoint.
The objective of union is to merge the list pointed to by a into
the list pointed to by b. The function union first sets the r field of
each cell in a to b.r. Finally, the contents of a.next and b.next
are swapped to merge the two lists.
Our tool automatically instruments union with updates to the
functions Bnext and Rnext. This instrumentation is introduced only
for the last two statements, which are the only statements that update
the next field. For each of these statements, the instrumentation
also checks, by inserting an assertion, that the update of
next leaves the heap well-founded. For example, the assertion introduced
just before the last statement of the function is as follows:
assert(b  Rnext(tmp) => tmp  Hnext)
This assertion fails precisely if there is a chain of cells connected
by next from tmp to b in which no cell is in Hnext. Such a chain
causes an ill-founded cycle of cells upon execution of the statement
b.next = tmp. Furthermore, this statement modifies Bnext(b) and
Rnext(b) as follows:
Bnext(b) = (tmp  Hnext) ? tmp : Bnext(tmp)
Rnext(b) = (tmp  Hnext) ? {b} : {b}  Rnext(tmp)
A precise and complete description of both Bnext and Rnext is given
in Section 4.
For union, we prove the absence of null dereferences, the assertions
described above to check that the heap remains well-founded,
and four postconditions. The first postcondition is particularly interesting.
It states that the blocking cell for a is b and the blocking
cell for b is a. These two facts together mean that cells a and b are
the only two head cells in a cycle, which indicates indirectly that
indeed the union of the two lists has been created. In the third postcondition,
old(Rnext) refers to the value of Rnext at the beginning
of the function. The final postcondition says that the r field remains
unchanged for all those cells which do not belong to Rnext(a) ini-
tially.
So far, the set of head cells has been a constant. The first
postcondition also motivates the need to modify the set of head
cells. It would be intuitively more satisfactory if the programmer
can remove the cell a from Hnext once the list a has been merged
into b. Therefore, we allow the programmer to remove a cell from
Hnext by using the Remove operation. For example, we could add
the statement
Hnext.Remove(a);
at the very end of union. With this modification, our tool proves
the following more pleasing postconditions.
//@ ensures Bnext(b) == b && Bnext(a) == b
//@ ensures u  Rnext(b) : u.r == b.r
//@ ensures Rnext(b) == (old(Rnext)(a)  old(Rnext)(b))
Since the values of the variables Bnext and Rnext depend on Hnext,
the instrumentation for this statement is nontrivial. Since Hnext
becomes smaller, the heap might not remain well-founded. The
instrumentation not only checks via an assertion that the heap
remains well-founded, but also updates the values of Bnext and Rnext
appropriately. We give a precise description of the instrumentation
117
//@ requires {null}  Hnext
//@ requires Bnext(l) == null
//@ requires l != null && p != null
//@ requires p  Hnext
//@ requires p.data > l.data
//@ requires sorted(l)
//@ requires p  Rnext(l)
//@ ensures sorted(l)
//@ ensures p  Rnext(l)
//@ ensures
u : u  Rnext(l) <=> (u  old(Rnext)(l) || u == p)
void insert(Cell l, Cell p) {
Cell curr = l;
Cell succ = l.next;
while (succ != null) {
if (p.data > succ.data) {
curr = succ;
succ = curr.next;
}
else
break;
}
p.next = succ;
curr.next = p;
}
Figure 2. Inserting an element into a sorted acyclic list
for this statement as well as for Hnext.Add(x), the converse of this
statement, in Section 4.
2.2 Insert
Our second example is the function insert shown in Figure 2. The
function insert takes an acyclic list l and a cell p as arguments.
The cells in the list l are in sorted order based on the values of
the field data of the cells. The predicate sorted(l) is defined as
follows:
sorted(l) =
u  Rnext(l) : u.next == null|| u.data <= u.next.data
The objective of insert is to insert the cell p in the appropriate
place in l so that l remains sorted. The fourth precondition requires
that the cell p is not in Hnext. If p  Hnext, then on return, Rnext(l)
contains all the cells from l up to but not including p, and thus
violates the the second postcondition p  Rnext(l). The fifth
precondition of insert is also worth explaining. To simplify the
coding of insert, it is expected that the first element of l is a
dummy whose data field is guaranteed to be less than any value
that might be inserted in the list.
This example illustrates an important advantage of axiomatizing
linked lists in first-order logic. The specification of insert uses a
combination of facts about reachability via the next field, uninterpreted
functions, arithmetic, and propositional logic. The axiomatization
of linked lists in first-order logic immediately allows us
to use any one of a number of theorem provers that deal with a
combination of first-order theories. For our implementation, we are
using the UCLID theorem prover [7]. Although this example does
not use any arrays, adding them is a simple matter because they can
be modeled easily with uninterpreted functions and the well-known
select-update axioms.
//@ requires wf dlist head(hd)
//@ requires p != null
//@ requires p  Rnext(hd)
//@ requires p != hd
//@ requires Bnext(p) == hd
//@ requires p  Hnext && p  Hprev
//@ ensures p  Rnext(hd)
//@ ensures wf dlist head(hd)
//@ ensures
u : (u  Rnext(hd) || u == p) <=> u  old(Rnext)(hd)
void dlist remove(Cell hd, Cell p) {
Cell tp = p.prev;
Cell tn = p.next;
tp.next = tn;
tn.prev = tp;
}
Figure 3. Removing an element from a doubly-linked list
2.3 Remove
Our third example is the function dlist remove given in Figure 3.
This function removes a cell p from a cyclic doubly-linked list
with head hd. This example illustrates that our technique handles
doubly-linked lists just as cleanly as singly-linked lists.
Our instrumentation adds variables Rnext and Bnext for the linking
field next, and Rprev and Bprev for the linking field prev.
The instrumentation happens just as before for each linking field
as if they are independent. However since the linking field pair
(next, prev) forms a doubly linked list, we need to define additional
preconditions to relate the two fields and their auxiliary vari-
ables.
We define a predicate wf dlist head(hd) to denote that hd
points to a well formed doubly linked list. This predicate is a
conjunction of the following predicates:
1. hd != null
2. hd  Hnext && hd  Hprev
3. u  Rnext(hd) : u.next.prev == u && u.prev.next == u
4. Rnext(hd) == Rprev(hd)
5. Bnext(hd) == hd && Bprev(hd) == hd
The last conjunct indicates that hd is the unique head cell in Hnext
and Hprev present in the cyclic list. The predicate wf dlist head(hd)
is particularly useful when the routine dlist remove is invoked in
a loop to, for example, remove all the cells from the list that satisfy
some property. Then this predicate will be a conjunct in the loop
invariant.
The first precondition of the routine requires that the list is well
formed. The next two preconditions state that p points to a cell in
the list. The next precondition states that p is different from hd and
is required otherwise the list becomes ill-founded after the remove
operation. The postconditions assert that p is removed from the list
and the list remains well-formed.
3. Programs
The set Var is a set of program variables. A variable in Var may
have one of three types: Boolean, Integer, or Cell. The set Cell
contains the addresses of heap objects, each of which may have
fields from the set Field. A field may also have one of the three
aforementioned types. A field of type Boolean is a map from Cell
118
Boolean = {false, true}
Integer = {. . . , -2, -1, 0, 1, 2, . . .}
u, v, w  Cell = {null, . . .}
x, y, z  Var
f, g, h  Field
c  Const = Boolean  Integer  {null}
Op = { + , - , == , != , < , && , || }
s  Stmt ::= x = c | x = y | x = y Op z |
assume(x) | assert(x) |
x = new | x = y.f | x.f = y
Figure 4. Program syntax
to Boolean, a field of type Integer is a map from Cell to Integer,
and a field of type Cell is a map from Cell to Cell. We refer to the
last category of fields as reference fields. Finally, Const is the set
of constants that may appear in the program. The constant null is
a special constant of type Cell.
A program is a control flow graph (PC, E, pci , pcf , L) with
five components. PC is a finite set of program locations, E 
PC × PC is the set of control-flow edges, pci  PC is the initial
location where program execution begins, pcf  PC is the final
location where program execution terminates, and L is a function
that maps each edge in E to a statement in Stmt.
The restricted syntax of the statements in Stmt does not result
in any loss of generality, because more complex statements can
reduced to simple statements by the introduction of temporary
variables. The statement assume(x) together with nondeterminism
in the control flow graph can be used to encode if-then-else and
while statements. Moreover, other operations can be encoded with
the operations in Op. For example, boolean negation of x can be
encoded as x == false. We also assume that the program is free of
typing errors.
3.1 Programming with heads
We allow programmers to specify a subset of the reference fields
as linking fields. These fields are expected to provide the links in
data structures such as singly-linked and doubly-linked lists. For
each linking field f, the programmer is allowed access to a variable
Hf of type Set(Cell). The set Hf contains a subset of the set of
allocated cells that together make all lists linked by f well-founded.
The set Hf is required to always contain the cell null. We provide
two operations to update Hf.
1. Hf.Add(x): The variable x is required to be of type Cell. This
statement adds the cell obtained by evaluating x to the set
Hf. This statement provides the fundamental mechanism for
making circular lists linked by the field f well-founded. A
programmer creates such a list by first creating the head cell
and then adding it to the set Hf.
2. Hf.Remove(e): Again, the variable x is required to be of type
Cell. This operation requires that x be different from null. This
statement removes the cell obtained by evaluating x from the
set Hf. The precondition ensures that we never remove null
from Hf. We have already illustrated the utility of this operation
in Section 2.1. This operation is used to remove the head cells
of those circular linked lists that are merged into other lists.
In addition to the usual statements, we also allow the edges of
the control flow graph to be labelled with the above operations on
Hf for each linking field f. In Section 4, we describe how we ensure
well-foundedness of the heap by automatically instrumenting the
program with assertions at every occurrence of an update to the
linking fields and every occurrence of the two operations described
here.
3.2 Semantics
The state  of the program contains a program counter in PC and a
valuation of the variables in Var, the fields in Field, and the head
variables Hf for each linking field f. Variables of type Boolean are
initialized to false, variables of type Integer are initialized to 0,
and variables of type Cell are initialized to null.
The state also contains a special variable Alloc of type Set(Cell)
to model memory allocation via new. The variable Alloc is initialized
to {}. The statement x = new removes a nondeterministically
chosen cell that is not equal to null from Cell \Alloc, assigns it to
x, and adds it to Alloc. Thus, the statement x = new is desugared
to the following statements:
x = choose(Cell \ {null}\Alloc);
Alloc = Alloc  {x};
When the program executes, its state changes in accordance
with the standard operational semantics of its control flow graph.
To model the program misbehaving by performing a dereference of
null or by failing an assertion, we add a special state wrong with
no transition out of it. For example, if the value of the program
counter in  is l and the edge (l, l )  E is labelled with x.f = y,
then one of the following may happen.
1. (x) is null and the program moves to the state wrong.
2. (x) is not null and the program makes a transition to a state
 in which the program counter is l , the map for f is modified
only at cell (x) to (y), and otherwise the state remains
unchanged.
The operational semantics of the other statements can be defined
similarly.
4. Program instrumentation
In this section, we show how to automatically instrument a program
to ensure that the linked lists in the heap always remain wellfounded.
The instrumentation is performed with respect to a subset
of reference fields called linking fields that act as the links in a
linked structure. We automatically instrument the program with
two auxiliary variables for every linking field f. These variables,
called Rf and Bf, record information about the shape of the heap
graph and are essential to our verification method. The variable Rf
is a map from Cell to Set(Cell). The variable Bf is a map from
Cell to Cell. Intuitively, for any u  Cell, Rf(u) is the set of
heap cells containing u and every cell obtained by one or more
applications of f to u until a cell v  Hf is reached and Bf(u) = v.
Note that the final cell v is not a member of Rf(u). We will often
write Rf(u, v) to denote v  Rf(u).
Although the variables Rf and Bf can be defined as a mathematical
function of the program state, this definition uses transitive closure
and therefore cannot be expressed in first-order logic. An important
insight of our work is that even though these variables cannot
be defined without using transitive closure, the updates to them
as the program state changes can be defined entirely in first-order
logic. We automatically instrument the program to record these updates
and are thus able to state the assertions which ensure that the
heap remains well-founded.
Let P be a program with control flow graph (PC, E, pci , pcf , L).
The instrumented program P#
is obtained by instrumenting each
individual statement L(e) for each edge e  E. We now define
the instrumentation for a given statement s. In many cases shown
below, the instrumented statement contains control flow. Such a
complex statement is used only for clarity of presentation, and can
easily be translated into a simple statement without control flow.
119
1. If the statement s is of the form x = c, x = y, x = y Op
z, assume(x), assert(x), x = new, or x = y.f, then the
instrumented statement is identical to s. The definition of Rf
and Bf depends only on the value of map f and the set Hf. Since
neither of those are changed by the statement, we do not need
to update the instrumentation variables.
2. If the statement is of the form x.f = y and f is a linking field,
then the instrumented statement is
assert(x != null);
assert(x  Rf(y) => y  Hf);
if (y  Hf) {
Bf = i.
y if x  Rf(i)
Bf(i) otherwise
Rf = i.
(Rf(i) \ Rf(x))  {x} if x  Rf(i)
Rf(i) otherwise
}
else {
Bf = i.
Bf(y) if x  Rf(i)
Bf(i) otherwise
Rf = i.
(Rf(i) \ Rf(x))  {x}  Rf(y) if x  Rf(i)
Rf(i) otherwise
}
x.f = y;
This statement is the only one that updates the link structure
of the heap. First, the instrumentation checks via the assertion
assert(x  Rf(y) => y  Hf) that the heap remains wellfounded,
that is, no cycle of cells unbroken by a member of Hf
is created as a result of updating the f field of x to y. If such
a cycle is created, then there must be a path from from y to x
in which no cell, including y and x, is in Hf, which results in a
violation of this assertion.
Second, the two instrumentation variables Bf and Rf are updated.
The values of these functions is updated at a particular
cell i only if x  Rf(i), that is, there is a path from i to x not
broken by any member of Hf. If x  Rf(i), the update is split
into two cases. If y  Hf, then Bf(i) becomes y and we remove
from Rf(i) everything that is reachable from x without
hitting a member of Hf but then add x itself. If y  Hf, then
Bf(i) becomes Bf(y) and we remove from Rf(i) everything that
is reachable from x but then add x itself and everything that is
reachable from y.
3. If the statement is of the form Hf.Add(x), then the instrumented
statement is
if (x  Hf) {
Bf = i.
x if x  Rf(i) && x!= i
Bf(i) otherwise
Rf = i.
Rf(i) \ Rf(x) if x  Rf(i) && x!= i
Rf(i) otherwise
Hf = Hf  {x};
}
This statement does not update the linking structure of the
heap; it only changes the value of Hf. Since the values of
the instrumentation variables Bf and Rf depend on Hf, these
variables must be updated if Hf changes in case x  Hf. The
values of Bf and Rf are updated at a particular cell i only if
x  Rf(i)  x = i, that is, x becomes new blocking cell for
i. In this case, Bf(i) becomes x and we remove from Rf(i)
everything that is reachable from x without hitting a member
of Hf.
4. If the statement is of the form Hf.Remove(x), then the instrumented
statement is
if (x  Hf) {
assert(Bf(x) != x);
Bf = i.
Bf(x) if Bf(i)== x
Bf(i) otherwise
Rf = i.
Rf(i)  Rf(x) if Bf(i)== x
Rf(i) otherwise
Hf = Hf \ {x};
}
This statement, just like Hf.Add(x), does not update the linking
structure of the heap; it only changes the value of Hf. The values
of the instrumentation variables Bf and Rf must be updated if
Hf changes in case x  Hf. First, the instrumentation checks
via an assertion that the removal of x from the set of head
cells does not result in a heap that is not well-founded. A bad
heap can result if x is the only head cell in a cycle, a condition
captured by Bf(x) = x and the assertion checks for precisely
the negation of this condition. Note that since Bf(null) = null
by definition, this assertion also checks the precondition of this
operation that x is nonnull.
Second, the two instrumentation variables Bf and Rf are updated.
The values of Bf and Rf are updated at a particular cell
i only if Bf(i) = x, that is, x is the blocking cell for i. In this
case, Bf(i) becomes Bf(x) and we add to Rf(i) everything that
is reachable from x without hitting a member of Hf.
4.1 Correctness
In this section, we formalize the correctness of our instrumentation.
The instrumented state  of the program is a valuation of the
variables in Rf and Bf for each linking field f. While the original
program makes transitions of the form    , the instrumented
program makes transitions of the form (, )  ( ,  ). To state
our correctness theorem, we first need to define a well-formed state.
DEFINITION 1 (Well-founded function). A function f : Cell 
Cell is well-founded with respect to a set of cells H, if for any cell
u  Cell, there exists n > 0, such that fn
(u)  H.
If f is well-founded with respect to h, we can define the functions
Bf
H : Cell  Cell and Rf
H : Cell  Set(Cell) as follows.
Rf
H (u) = {v  Cell | v = u 
n : 0 < n : (v = fn
(u)
m : 0 < m  n : fm
(u)  H)}
Bf
H (u) = fn
(u), where n = min{m | 0 < m  fm
(u)  H}
DEFINITION 2 (Well-founded state). A state  is well-founded if
for every linking field f, the function (f) is well-founded with
respect to (Hf).
DEFINITION 3 (Well-formed state). A state  is well-formed if
the following conditions are satisfied:
1.  is well-founded.
2. null  Alloc.
120
3. For every variable x of type Cell, (x)  (Alloc) or (x) =
null.
4. For every cell u  (Alloc) and reference field f  Field,
(f)(u)  (Alloc) or (f)(u) = null.
5. For every cell u  Cell \ (Alloc) and reference field f 
Field, (f)(u) = null.
6. For every linking field f, null  (Hf).
A state is ill-formed if it is not well-formed.
We define an instrumentation function I on well-formed states.
For every program state , the application I() yields an instrumented
state  such that (Bf) = Bf
H and (Rf) = Rf
H , where
f = (f) and H = (Hf). Now, we can state a theorem that characterizes
the precision of our instrumentation.
THEOREM 1. Suppose  and  are well-formed states of a program
P and P#
is the instrumented version of P. Then the following
statements are true.
1. P can make a transition from  to  iff P #
can make a
transition from (, I()) to ( , I( )).
2. P makes a transition from  to either an ill-formed state or
wrong iff P #
makes a transition from (, I()) to wrong.
The proof of this theorem is performed by a case analysis over the
various types of statements and is given in the full version of the
paper [23].
5. First-order axiomatization of well-foundedness
Our checker verifies a program by checking the existence of a wellformed
state satisfying a verification condition. The definition of
a well-formed state (from Section 4.1) has six constraints. It is
easy to write simple first-order axioms to model constraint 2­6.
But the first constraint, which requires the state to be well-founded,
involves a nontrivial relationship between the variables f, Rf, Bf
and Hf for any linking field f. In fact, well-foundedness cannot be
precisely expressed in first-order logic.
In this section, we first provide two first-order axioms (called
base axioms) to capture the relationships between Rf, Bf and f.
We then provide an induction principle (IND-WF) that enables us
to derive additional first-order axioms from the base axioms. For
convenience, we will use the notation f(u) to denote u.f in this
section.
5.1 Base axioms
The following two fundamental axioms characterizes the relationship
between the predicates Rf, Bf, Hf and f in any state  of the
program. In all these axioms, the cells u  Cell, v  Cell are
implicitly universally quantified out.
1. The first axiom specifies that Rf(u, v) is true if and only if either
(i) there is a zero-length path from u to v (when u = v), or
(ii) there is a path of length one or more from u to v without
encountering any cells from Hf.
Rf(u, v)  (u = v  (f(u)  Hf  Rf(f(u), v))) (AX1)
2. The second axiom relates Bf(u) with Bf(f(u)) when f(u) 
Hf. If f(u)  H, then Bf(u) = f(u).
Bf(u) = (f(u)  Hf) ? f(u) : Bf(f(u)) (AX2)
The above axioms follow from the definition of the two predicates
from Section 4.1.
We define a state  to be finite if the set Alloc is finite in .
Also, recall (from Section 4.1) that for any function f : Cell 
Cell and a set H  Cell, Rf
H (u) defines the set of all cells
u, f(u), . . . until the first cell from H is encountered, and Bf
H(u)
denotes the identify of the first cell in H encountered.
The base axioms intuitively capture the definition of the fields
Rf and Bf. The following theorem, which is a natural generalization
of the theorem for acyclic case [24], serves to make this intuition
precise.
THEOREM 2. For any finite and well-formed state  satisfying
axioms AX1 and AX2, (Rf) = Rf
H, and (Bf) = Bf
H, where
f = (f) and H = (Hf).
5.2 Induction principle
Even though the axioms AX1 and AX2 capture fundamental
properties of Rf
H and Bf
H, they are not complete. As noted by
Lev-Ami et al. [24], at least one of the limitation of the above
axioms (and first-order logic in general) is that there is no complete
axiomatization of "finiteness".
A consequence of this limitation is that above axioms might
imply other derived axioms in any finite state, but these additional
axioms can't be derived from the two axioms above solely by
first order reasoning. These derived axioms are required to further
constrain the set of states to give a meaningful assignments to Rf,
Bf and f.
In this section, we present an induction principle (IND-WF) that
can be used to derive other theorems from the base axioms AX1
and AX2.
DEFINITION 4 (IND-WF). Consider a well-formed state  where
f = (f) and H = (Hf), for the linking field f. To show that
a property P(u) holds for all u  Cell, it is sufficient to establish
two cases:
1. Base Case: Establishes the property for all cells u such that
f(u)  H:
f(u)  H  P(u) (1)
2. Induction Step: If P holds for a cell v and v  H, then establish
that P holds for all the cells u such that f(u) = v:
(P(v)  f(u) = v  v  H)  P(u) (2)
We show that if the above induction principle establishes that a
predicate P is true for all the cell u  Cell in a state , then 
satisfies u : P(u).
THEOREM 3. The induction principle IND-WF is sound.
5.3 Derived axioms
In this section, we present a small set of useful first-order axioms
that have been required in the various proof efforts we have undertaken.
These axioms can be derived from the base axioms using the
induction principle described above. It is a challenge to identify a
"core" subset of axioms that is not only useful in practice, but also
fairly intuitive to understand. We have identified a small set of such
axioms that seem to be sufficient for the set of examples handled
in this paper. Of course, we can't make any claims whether this set
will suffice for other programs too, since our experience is limited
to the set of programs we have handled. But we believe that this
set captures the interesting cases in dealing with most linked list
programs.
1. Transitivity: The relation Rf enjoys the transitivity property:
(Rf(u, v)  Rf(v, w))  Rf(u, w) ( TR)
2. Antisymmetry: This property is key to breaking the symmetry
in a cycle.
(Rf(u, v)  Rf(v, u))  u = v ( AS)
121
3. Bounded distinctness: For any heap cell u, if none of the
cells f(u), f2
(u), . . . , fk
(u) intersect with the set Hf, then all
the cells in this sequence are distinct from each other. Let
Fk
.
= {f(u), . . . , fk
(u)}, and DISTINCT(S) denotes that
all the members in a set S are pairwise distinct. We can derive
a parameterized system of theorems DTk for different finite
values of k:
Fk  H = {}  DISTINCT({u}  Fk) ( DTk)
We have found that the case for k = 1 to be useful in proving
properties of many linked list programs (e.g. reverse of an
acyclic linked list):
(f(u)  H)  u = f(u) ( DT1)
These axioms can be proved easily from the base axioms AX1
and AX2. and the induction principle IND-WF. We illustrate
this by working through the proof of DT1:
Proof: We will substitute the expression (f(u)  H)  u =
f(u), in place of P(u) in Equation 1 and Equation 2. Then we
derive contradiction from the negation of the formula.
* Base case: Since f(u)  H, the formula for the base case
is unsatisfiable.
* Induction step: Substituting the definition of P and rewriting
v with f(u) in Equation 2, we get:
(f(f(u))  H  f(u) = f(f(u))) 
f(f(u))  H  f(u)  H  u = f(u)
Rewriting the formula after replacing f(u) with u (since
u = f(u)), and removing f(u) = f(u), we get
(f(u)  H  f(u) = f(u))  f(u)  H
which is a contradiction.

5.4 Optimization
In addition to the instrumentation discussed in Section 4, we also
introduce an assume statement immediately after each statement
of the form x = y.f:
assume(x  Hf => Rf(x) = Rf(y) \ {y});
The constraint described by the assume statement is an instance
of the following theorem (that can again be derived from AX1 and
the induction principle):
f(u)  Hf  Rf(f(u)) = Rf(u) \ {u} (T1)
The addition of the constraint eagerly introduces an appropriate
instance of Theorem T1 during the instrumentation phase. Since
this axiom is not added explicitly to the set of derived axioms used
in the proofs, the theorem prover is later prevented from searching
for instances of this axiom.
6. Decision procedure for well-foundedness
Many automated theorem provers including Simplify [9] and
UCLID [7] use a combination of heuristic quantifier instantiation
and decision procedures for ground theories to reason about quantified
formulas over certain theories. To take advantage of this approach,
we explore decision procedures for quantifier-free formulas
containing Rf and Bf. A decision procedure provides an alternative
to the methodology presented in Section 5, where we provide a set
of (incomplete) first-order axioms about transitive closure to reason
about formulas containing Rf and Bf. We show that the complexity
of the decision problem for the quantifier-free theory with Rf and
l  Literal ::= x = y | f(x) = y | x = null |
Rf(x, y) | Rf(x, y) | Bf(x) = y |
x  Hf | x  Hf |
x  Alloc | x  Alloc
  Formula ::= l | 1  2 | 1  2
Figure 5. Quantifier-free logic with Rf and Bf.
Bf is NP-complete. We also provide a na¨ive decision procedure for
the fragment based on a small-model encoding.
Consider a quantifier-free formula   Formula in the logic
described in Figure 5. It is easy to see that one can reduce a formula
containing nested applications of f or Bf (such as f(f(x))) to a
formula in this logic by introducing additional skolem constants.
For simplicity of presentation, let us take the liberty to include
these additional constants in the set of program variables Var.
Accordingly, a program state  includes an evaluation for these
additional symbols. For any state  and a formula   Formula,
we can define the meaning of  under  ( |= ) inductively over
the structure of  in the usual way.
Suppose L  Literal is a finite set of literals. We write  |=
L if  |=
Î
l for all l  L. The set L is satisfiable if there
exists a well-formed state  such that  |= L. The decision
problem WFSAT is the following: Is a given finite set L of literals
satisfiable?
LEMMA 1. The problem WFSAT is NP-hard.
The proof of this lemma is by reduction from the 3-COLOR problem.
In fact, the decision problem WFSAT remains NP-hard even
when we restrict ourselves to literals of the form x = y, f(x) = y,
x = null and Rf(x, y).
A state  is finite if (Alloc) is a finite subset of Cell. If
 is finite, we define || = |(Alloc)|. We show that the logic
described in Figure 5 above enjoys a small-model property. Given
a finite set L of literals, let NVar(L) be the number of distinct
variables that occur in the literals in L. Then we can prove the
following:
LEMMA 2. If a finite set of literals L is satisfiable, then there
is a finite well-formed state  such that  |= L and || 
NVar(L) + 1.
The following theorem follows easily from Lemmas 1 and 2.
THEOREM 4. The problem WFSAT is NP-complete.
The small-model property of Lemma 2 suggests a straightforward
decision procedure for the logic. Given a formula  in this
logic, we can replace each occurrence of Rf and Bf with their definitions.
For example, if k
.
= NVar(L) + 1, we can replace Bf(x)
with the formula:
f(x)  Hf ? f(x) :
 
f2
(x)  Hf ? f2
(x) : . . . k times . . .

The resultant formula can be solved using a (combined) decision
procedure for equality with uninterpreted functions (EUF) and
linear arithmetic. In Section 7, we briefly describe our initial experience
with the decision procedure.
7. Experiments
We have implemented a prototype tool to mechanically verify properties
of programs containing linked lists. In this section, we briefly
describe the components of the prototype and some preliminary results
on a set of programs manipulating singly or doubly linked
lists.
The source program is written in the Zing [1] programming
language, an imperative Java-like language but without inheritance.
122
In addition to the programming language discussed in the paper, it
can also support arrays of Boolean, Integer and Cell.
The user specifies a subset of fields in the program as linking
fields. For each field f, the tool automatically instruments the Zing
program with the auxiliary variables Rf, Bf, and adds the updates to
these variables. The instrumented program also contains the asserts
for null-dereference and the asserts to ensure that the fields are
well-founded. The instrumented program is then translated to a
guarded transition system and is fed to the UCLID verification
system [7]. Let us briefly describe the UCLID system and the
features that are used in this paper.
7.1 UCLID
The input language of UCLID supports variables of type Booleans,
integers, and functions (and predicates) over integers. The functions
can have any finite arity. Each variable of type Cell in the
source program is mapped to an integer variable. Each field f is
mapped to a function variable from integers to integers or Booleans.
Set-valued variables such as Rf (respectively Hf) are mapped to a
predicate variable of arity two (respectively one). Similarly, Bf is
mapped to a function variable of arity one.
The updates to function and predicate variables are modeled using
a  notation used in Section 4 of this paper. The  notation generalizes
the theory of arrays, and allows us to modify an arbitrary
number of entries in an array in a single step. This is convenient for
expressing the updates of Rf and Bf variables.
The user can add the necessary preconditions, loop invariants
and the postconditions to check in the UCLID file. Recall that some
of the asserts are automatically generated by our translator and are
part of the list of properties to check in the UCLID file. The file
also includes the set of base axioms and derived axioms described
in Section 5.
UCLID supports specification of first-order axioms and properties
of the form X : (X), where  is a quantifier-free expression
over the state variables in the program. This is sufficient to express
the axioms, preconditions, loop invariants, and the postconditions
for all the examples that we have encountered in this work. Set operations
are modeled using quantified expressions (e.g. S = S1 S2
gets translated to x : S(x)  (S1(x)  S2(x)).
The tool can be used in two different ways:
* Proving verification conditions (VC): Given the preconditions,
loop invariants and the postconditions, the tool generates
a VC that is checked using the theorem prover in UCLID.
The theorem prover uses quantifier instantiation to eliminate
quantifiers in the formula and then uses a Boolean Satisfiability
(SAT) based decision procedure for the theories of uninterpreted
functions and linear arithmetic. Failed VCs result in a
concrete counterexample that is immensely useful for strengthening
the loop invariants, adding more preconditions or (in rare
cases) adding new axioms.
* Synthesizing loop invariants using indexed predicate abstraction:
The tool can automatically construct universally
quantified invariants using simple indexed predicates from a
set P. Each predicate p  P is a Boolean expression over
the state variables and a set of index symbols X of type integers
(recall Cell are identified as integers in UCLID). The
tool then constructs the strongest loop invariant of the form
X : (X ), where (X ) is a Boolean combination of the
predicates in P [21, 13].
For instance, given the set of predicates P = {u = next(v), v =
prev(u), Rnext(hd, u), Rnext(hd, v)}, with X = {u, v}, the
tool can compose the predicates to construct an invariant (say)
u, v : (Rnext(hd, u)  Rnext(hd, v)) 
(v = prev(u)  u = next(v))
Example VC Size # Instants Time Taken
in UCLID (sec)
cyclic simple 991 (45) 0 2.1
reverse a 1896 (72) 0 9.5
setunion S1 1134 (49) 0 3.9
setunion 2960 (104) 1 11.4
insert 5950(110) 2 37.5
dlist remove 3362 (115) 1 30.1
Figure 6. Results of verifying linked list programs. "VC Size"
denotes the number of nodes in the VC formula after instantiation;
the numbers in the parenthesis denote the number of integer-valued
terms in the formula. The column "# Instants" denotes the number
of manual instantiations that had to be done.
The ability to generate complex quantified invariants from simple
predicates often relieves the user from writing down wellformed
loop invariants. Various heuristics are also provided to
often infer most predicates [22].
In the next two subsections, we describe our initial experience
with using the two features of UCLID to verify a set of programs
manipulating linked lists.
7.2 Benchmarks
We have currently handled the following set of examples using the
two options of UCLID mentioned above:
* cyclic simple: This the example from Section 1 where all
the elements of a cyclic list are initialized to 0.
* reverse a: This example performs an in-place destructive update
of an acyclic linked list pointed to by a variable l.
* setunion, setunion S1: This is the example described in
Figure 1. The example setunion S1 denotes the portion of the
example before the destructive updates to combine the two lists
into one. We have included both setunion and setunion S1
to demonstrate the increase in complexity in the presence of
destructive updates.
* insert: This is the example of insertion into a sorted list
described in Figure 2 in Section 2.
* dlist remove: This example of a doubly linked list is also
described in Section 2.
7.3 Proving verification conditions
In this section, we describe the set of examples for which we
provided the loop invariant manually and used the theorem prover
to prove the verification conditions.
Figure 6 describe the results of verifying these examples on
a 2 GHz machine running Linux and 1GB memory. ZChaff [29]
was used as the SAT solver inside UCLID. We have measured
the complexity of the VC generated (VC Size) and the total time
taken by the theorem prover to prove the formula. We have also
included the number of cases where we had to manually instantiate
an axiom.
Observe the difference between the verification of setunion S1
and setunion. In the latter case, adding the destructive updates requires
one manual instantiation as well as increased CPU time.
Also, the complexity of verifying dlist remove comes from two
sources even though the program is just four lines long: first, the
precondition of the method is complex, and secondly, the set of
axioms double and number terms to instantiate increases because
of the presence of two fields (next, prev) in the program.
123
Example # Predicates # Iterations Time Taken
(sec)
cyclic simple 15(1) 4 11.4
reverse a 16(1) 6 85.45
set union 24 (1) 5 79.79
insert 21(2) 9 1404.07
dlist remove - - 30.1
Figure 7. Results of verifying linked list programs using indexed
predicate abstraction. "# Predicates" denotes the number of predicates
requires; the number in the parenthesis denotes the number
of index variables in the predicates. "# Iterations" is the number of
iterations of abstract reachability computation to compute the invariant.
"Time taken" is the time taken to construct the invariant
and prove the postconditions. The example dlist remove did not
require any loop invariant.
The need for manual instantiation of quantifiers in the axioms
have come mainly from two cases so far:
* The quantifier instantiation heuristics fail to infer relevant terms
to instantiate. It only happened once for the example insert,
where we had to instantiate AX2 with a concrete term.
* Since the axiom TR relates three bound variables u, v and
w, and the number of combinations to instantiate grows exponentially
with the number bound variables resulting in a very
large VC, we always use the axiom after instantiating the first
argument u with one of the program variables. This was the
source of one instantiation for each of setunion, insert and
dlist remove examples.
Both the above problems can be mitigated by better quantifier
instantiation strategies, and we are working towards improving the
heuristics in UCLID. We do not use Simplify to discharge the VCs
because the concrete counterexample facility in UCLID offsets the
value of more advanced instantiation heuristics present in Simplify.
We also implemented a simple decision procedure for the
quantifier-free fragment (described in Section 6), that is based on
the small-model property (Lemma 2) of the logic. However, this
simple scheme did not yield an efficient decision procedure for this
fragment. For the set of examples in the paper (where NVar()
ranges between 10 and 30), expanding Rf and Bf resulted in a huge
blowup. Apart from the simplest example cyclic simple, the decision
procedure did not terminate on the other examples within a
1000 seconds time limit.
7.4 Invariant synthesis
We have also leveraged the predicate abstraction engine in UCLID
to infer loop invariants for some of the examples. Our initial experience
has been encouraging, and we report some preliminary results
in this section. At present, we have managed to construct the
loop invariants for the cyclic simple, reverse a, set union
and insert examples, given a set of indexed predicates.
Figure 7 illustrates the result of synthesizing loop invariants using
indexed predicate abstraction for a subset of the examples. Currently
the tool suffers from two bottlenecks that results in significant
time to construct the invariants. This also explains the time
taken to prove the verification conditions in the last section.
* UCLID does not maintain the control flow graph explicitly and
encodes the program counter as a variable -- the entire program
is encoded as a single transition relation. This is because the
tool was primarily built for analyzing distributed protocols and
systems. This results in a large blowup in the formulas that the
theorem prover or the predicate abstraction engine gets, and
//@ requires null  Hnext
//@ requires Bnext(l) == null
//@ ensures Rnext(res) == old(Rnext)(l)
Cell reverse (Cell l) {
Cell curr = l;
Cell res = null;
while (curr != null) {
Cell tmp = curr.next;
curr.next = res;
res = curr;
curr = tmp;
}
return res;
}
Set of Predicates
X = {u}
P = {u = null, u = curr, u = res, u = old(l),
Rnext(curr, u), Rnext(res, u), Hnext(u),
old(Rnext)(old(l), u), l = old(l), Rnext(old(l), u),
curr = null, Bnext(u) = null}
Figure 8. Reversing an acyclic list. The source program along with
the set of predicates. We use old(x) to denote the value of x at the
method entry.
consequently slows the analysis. We are currently working on
incorporating explicit control flow into the tool.
* The quantifier instantiation engine generates a large number of
(often redundant) terms to instantiate. Moreover, the number
of combinations to instantiate grows exponentially with the
number of index variables in X . For instance, with two index
variables, the number of combinations to instantiate went up
to 81 for the insert example. In some cases, SAT often saw
formulas with more than 300K clauses in them.
To mitigate these problems, we are exploring alternate quantifier
instantiation and predicate abstraction strategies. Besides, one can
often trade off the precision of the predicate abstraction to construct
less precise loop invariants more efficiently. We plan to investigate
if such loop invariants suffice to prove the properties of interest for
these candidate programs.
Figure 8 describe the reverse a example along with the set
of predicates required to prove the postcondition. The example
required a single index symbol (denoting a heap cell) u to construct
the loop invariant. The tool requires 85.45 seconds to construct
the loop invariant and prove the property. The set of predicates for
this example was supplied manually. We are currently working on
automating the process of predicate discovery.
8. Related work
There is a rich body of work in reasoning about programs that
perform destructive updates of heap allocated data structures. Work
in this area can be divided into the following often overlapping
categories.
First-order axiomatization of reachability. The work in this
category is closest to our work. Nelson [32] proposed the ternary
reachability predicate u
f
-
x
v to define that u can reach v through
applications of f without encountering x, and provided a set of
124
first-order axioms to capture this predicate. Our axiomatization is
based on two binary predicates. We believe that our predicates are
more intuitive to a programmer and our axiomatization yields simpler
correctness proofs. For example, Nelson's proof of the setunion
example from Section 2.1 required eight axioms whereas our
proof requires only the transitivity axiom in addition to the two
base axioms. Lev-Ami et al. [24] proposed another set of axioms
for characterizing the reachability predicate. Their approach works
only for acyclic lists and will not allow them to express loop invariants
for programs manipulating cyclic lists. McPeak [27] proposes
a methodology for writing specifications of data structures in firstorder
logic. Since he does not attempt to axiomatize reachability,
his method, by itself, is not sufficient to express reachability or disjointness
properties of linked lists. To express such facts, the programmer
has to manually introduce and update ghost variables in a
program specific way. His work, however, provides new heuristics
for quantifier instantiation targeted towards reasoning for linked
data structures. This work should complement our work which depends
crucially on first-order theorem provers. Finally, our instrumentation
for updating Rf and Bf is similar to the work by Dong and
Su [11] on incrementally updating the transitive closure relation of
a graph due to unit changes in it.
Shape analysis. Work in this category attempts to reason about
the shape properties of the heap, such as acyclicity and sharing, in
the presence of destructive updates to the heap. Ghiya and Hendren
[14] propose the use of two Boolean matrices to record if two
pointers are reachable from each other and if they share any heap
cell. They provide conservative updates to these matrices for various
statements in the program. Sagiv et al. [38, 39] use a 3-valued
logic to represent abstractions of the heap graph. For improved precision,
their approach requires instrumentation predicates, which
usually use a reachability predicate as a building block. The updates
for these predicates are either supplied by the user or constructed
conservatively in some cases using finite-differencing [36]. The
idea is implemented in the TVLA [25] system and has been used to
infer loop invariants for various linked data structures. In contrast
to this work, our approach depends on purely first-order reasoning.
Recently, Hackett and Rugina [16] have proposed a method that
uses points-to information to construct an abstraction of the heap.
Their approach cannot encode relationships (e.g. equality) between
program expressions and cannot describe doubly-linked lists. One
advantage of our approach compared to all of the above is the ease
with which we can combine reasoning about linked lists with reasoning
about arithmetic and arrays.
Decidable logics. Various decidable logic fragments have been
also proposed to express properties of linked data structures. Nelson
and Yao [31] provide a polynomial decision procedure for the
quantifier-free fragment with Rf(x, y) and Rf(x, y). Ranise and
Zarba [35] provide a decidable logic with NP-complete decision
problem for reasoning about linked lists. Our contribution in this
paper has been to show the minimal extension to the logic in [31]
to obtain NP-hardness. PALE [28] uses monadic second-order logic
to express properties of lists, trees, and graphs. The work on graph
types by Klarlund et al. [20] allows common shapes, such doublylinked
lists and threaded trees, to be concisely expressed. They also
provide a decidable monadic second-order logic to check the wellformedness
of graph type specifications. The logic Lr [5] was proposed
to reason about reachability. However, the decidability results
quickly break down in the presence of complex shapes and
scalar values in the program.
Predicate abstraction for discovering invariants. Recently,
predicate abstraction [15] has been extended to construct invariants
for linked-list programs. Dams and Namjoshi [8] proposed the
use of reachability predicate to construct an abstraction of the concrete
system. They provide a heuristic for predicate discovery based
on constructing the weakest precondition of the reachability predicate.
Manevich et al. [26] observe that the number of shared nodes
in the heap consisting of singly-linked lists is statically bounded,
and propose a family of predicates to exploit this observation. The
idea has been implemented in TVLA. However, the method targets
shape properties of lists and can't verify properties regarding the
contents of a list as we do for the cyclic simple and insert
examples. Balaban et al. [2] provide a decision procedure for restricted
formulae involving the reachability predicate and provide
a method to compute the abstraction. The restrictions on the reachability
predicate does not allow them, for example, to express the
loop invariant for the cyclic simple program from Section 1.1.
In comparison to all of the above, our ability to harness invariant
inference methods for first-order logic (e.g. indexed predicate abstraction)
provides us with appreciable automation for programs
that manipulate linked structures, arrays, and other scalar values.
Separation logic. Separation logic [19, 37] is a promising idea
for local reasoning of heap-manipulating programs. Separation
logic naturally extends traditional Hoare-style reasoning to bear
upon such programs. However, most of the work on separation
logic focused on proving programs manually [33, 34]. Recently,
Berdine et al. [6] have developed decision procedures for fragments
of separation logic. However, we are not aware of any automated
tools for program verification based on these decision procedures.
Proofs of linked-list programs using separation logic use
the reachability predicate; our axiomatization of reachability could
potentially help with mechanizing such proofs. Alternatively, our
work can benefit from separation logic specifications in future as
we extend our work to the inter-procedural setting.
9. Conclusions
Programs such as kernels of operating systems and device drivers
manipulate a variety of data structures, such as arrays, singly and
doubly linked lists, and hashtables. Current verification tools focus
on control-dominated properties and are consequently unable to
handle such programs in which the control flow interacts with the
data in subtle ways. This paper is our first step towards the goal
of building a scalable checker for verifying low-level data-rich
systems software.
In this paper, we presented a novel method for verifying linked
data structures based on a first-order axiomatization of reachability
with respect to a set of head cells. This axiomatization is based
on the idea of a well-founded heap and has the advantage that
acyclic and cyclic lists are handled uniformly with equal ease. We
have implemented our method in a tool and used it to verify the
correctness of a variety of nontrivial programs manipulating both
acyclic and cyclic singly-linked lists and doubly-linked lists.
There are several immediate directions for future work. We
would like to verify more examples that use doubly-linked lists and
arrays to evaluate the overhead of using our methodology and the
adequacy of our list of derived first-order axioms. We have access
to several programs from the Windows kernel that use rich data
structures; we intend to evaluate our method on these programs.
We would like to automate the inference of invariants more by developing
heuristics for predicate discovery targeted towards linked
lists. Finally, we would like to extend our work to deal with procedure
calls and develop techniques to summarize procedures.
Acknowledgements
We are grateful to Venkatesan Guruswami for his help in proving
Lemma 1 and to Thomas Ball for his insightful comments on this
paper.
125
References
[1] T. Andrews, S. Qadeer, S. K. Rajamani, J. Rehof, and Y. Xie. Zing:
Exploiting program structure for model checking concurrent software.
In CONCUR 04: 15th International Conference on Concurrency
Theory, volume 3170 of LNCS, pages 1­15. Springer-Verlag, 2004.
[2] I. Balaban, A. Pnueli, and L. D. Zuck. Shape analysis by predicate
abstraction. In VMCAI 05: Verification, Model checking, and Abstract
Interpretation, volume 3385 of LNCS, pages 164­180. SpringerVerlag,
2005.
[3] T. Ball, R. Majumdar, T. Millstein, and S. K. Rajamani. Automatic
predicate abstraction of C programs. In PLDI 01: Programming
Language Design and Implementation, pages 203­213, 2001.
[4] C. Barrett and S. Berezin. CVC Lite: A new implementation of the
cooperating validity checker. In Computer Aided Verification (CAV
'04), volume 3114 of LNCS, pages 515­518. Springer-Verlag, 2004.
[5] M. Benedikt, Thomas W. Reps, and S. Sagiv. A decidable logic for
describing linked data structures. In ESOP 99: European Symposium
on Programming, volume 1576 of LNCS, pages 2­19. SpringerVerlag,
1999.
[6] J. Berdine, C. Calcagno, and P. W. O'Hearn. A decidable fragment of
separation logic. In FSTTCS 04: Foundations of Software Technology
and Theoretical Computer Science, volume 3328 of LNCS, pages
97­109. Springer-Verlag, 2004.
[7] R. E. Bryant, S. K. Lahiri, and S. A. Seshia. Modeling and verifying
systems using a logic of counter arithmetic with lambda expressions
and uninterpreted functions. In Computer-Aided Verification (CAV
'02), volume 2404 of LNCS, pages 78­92, July 2002.
[8] D. Dams and K. S. Namjoshi. Shape analysis through predicate
abstraction and model checking. In VMCAI 03: Verification, Model
checking, and Abstract Interpretation, volume 2575 of LNCS, pages
310­324. Springer-Verlag, 2003.
[9] D. L. Detlefs, G. Nelson, and J. B. Saxe. Simplify: A theorem prover
for program checking. Technical report, HPL-2003-148, 2003.
[10] E.W. Dijkstra. A Discipline of Programming. Prentice-Hall, 1976.
[11] G. Dong and J. Su. Incremental and decremental evaluation
of transitive closure by first-order queries. Information and
Computation, 120(1):101­106, 1995.
[12] C. Flanagan, K. R. M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe,
and R. Stata. Extended static checking for Java. In PLDI 02:
Programming Language Design and Implementation, pages 234­
245. ACM Press, 2002.
[13] C. Flanagan and S. Qadeer. Predicate abstraction for software
verification. In POPL 02: Principles of Programming Languages,
pages 191­202. ACM Press, 2002.
[14] R. Ghiya and L. J. Hendren. Is it a tree, a DAG, or a cyclic graph?
A shape analysis for heap-directed pointers in C. In POPL 96:
Principles of Programming Languages, pages 1­15. ACM Press,
1996.
[15] S. Graf and H. Sa¨idi. Construction of abstract state graphs with PVS.
In Computer-Aided Verification (CAV '97), volume 1254 of LNCS,
pages 72­83. Springer-Verlag, June 1997.
[16] B. Hackett and R. Rugina. Region-based shape analysis with tracked
locations. In POPL 05: Principles of Programming Languages, pages
310­323. ACM Press, 2005.
[17] T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Lazy
abstraction. In POPL 02: Principles of Programming Languages,
pages 58­70. ACM Press, 2002.
[18] N. Immerman, A. M. Rabinovich, T. W. Reps, S. Sagiv, and G. Yorsh.
The boundary between decidability and undecidability for transitiveclosure
logics. In CSL 04: Computer Science Logic, volume 3210 of
LNCS, pages 160­174. Springer-Verlag, 2004.
[19] S. S. Ishtiaq and P. W. O'Hearn. BI as an assertion language for
mutable data structures. In POPL 01: Principles of Programming
Languages, pages 14­26. ACM Press, 2001.
[20] N. Klarlund and M. I. Schwartzbach. Graph types. In POPL 93:
Principles of Programming Languages, pages 196­205. ACM Press,
1993.
[21] S. K. Lahiri and R. E. Bryant. Constructing quantified invariants via
predicate abstraction. In VMCAI 04: Verification, Model Checking
and Abstract Interpretation, volume 2937 of LNCS, pages 267­281.
Springer-Verlag, 2004.
[22] S. K. Lahiri and R. E. Bryant. Indexed predicate discovery for
unbounded system verification. In Computer Aided Verification (CAV
'04), volume 3114 of LNCS, pages 135­147. Springer-Verlag, 2004.
[23] S. K. Lahiri and S. Qadeer. Verifying properties of well-founded
linked lists. Technical Report MSR-TR-2005-97, Microsoft Research,
2005.
[24] T. Lev-Ami, N. Immerman, T. W. Reps, S. Sagiv, S. Srivastava,
and G. Yorsh. Simulating reachability using first-order logic with
applications to verification of linked data structures. In CADE 05:
Conference on Automated Deduction, volume 3632 of LNCS, pages
99­115. Springer-Verlag, 2005.
[25] T. Lev-Ami and S. Sagiv. TVLA: A system for implementing static
analyses. In SAS 00: Static Analysis Symposium, volume 1824 of
LNCS, pages 280­301. Springer-Verlag, 2000.
[26] R. Manevich, E. Yahav, G. Ramalingam, and M. Sagiv. Predicate
abstraction and canonical abstraction for singly-linked lists. In
VMCAI 05: Verification, Model Checking and Abstract Interpretation,
volume 3148 of LNCS, pages 181­198. Springer-Verlag, 2005.
[27] S. McPeak and G. C. Necula. Data structure specifications via local
equality axioms. In Computer-Aided Verification (CAV '05), volume
3576 of LNCS, pages 476­490. Springer-Verlag, 2005.
[28] A. Mller and M. I. Schwartzbach. The pointer assertion logic engine.
In PLDI 01: Programming Language Design and Implementation,
pages 221­231, 2001.
[29] M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik. Chaff:
Engineering an efficient SAT solver. In 38th Design Automation
Conference (DAC '01), 2001.
[30] G. Nelson and D. C. Oppen. Simplification by cooperating decision
procedures. ACM Transactions on Programming Languages and
Systems (TOPLAS), 2(1):245­257, 1979.
[31] G. Nelson and F. F. Yao. Solving reachability constraints for linear
lists, 1982. Unpublished manuscript.
[32] G. Nelson. Verifying reachability invariants of linked structures.
In POPL 83: Principles of Programming Languages, pages 38­47.
ACM Press, 1983.
[33] P. W. O'Hearn, J. C. Reynolds, and H. Yang. Local reasoning about
programs that alter data structures. In CSL 01: 15th International
Workshop on Computer Science Logic, volume 2142 of LNCS, pages
1­19. Springer-Verlag, 2001.
[34] P. W. O'Hearn, H. Yang, and J. C. Reynolds. Separation and
information hiding. In POPL 04: Principles of Programming
Languages, pages 268­280. ACM Press, 2004.
[35] S. Ranise and C. Zarba. A decidable logic for pointer programs
manipulating linked lists, 2004. Unpublished manuscript.
[36] T. Reps, M. Sagiv, and A. Loginov. Finite differencing of logical
formulas for static analysis. In ESOP 03: European Symposium
on Programming, volume 2618 of LNCS, pages 380­398. SpringerVerlag,
2003.
[37] J. C. Reynolds. Separation logic: A logic for shared mutable data
structures. In LICS 02: Logic in Computer Science, pages 55­74.
IEEE Computer Society Press, 2002.
[38] R. Wilhelm S. Sagiv, T. W. Reps. Solving shape-analysis problems
in languages with destructive updating. ACM Transactions on
Programming Languages and Systems (TOPLAS), 20(1):1­50, 1998.
[39] M. Sagiv, T. Reps, and R. Wilhelm. Parametric shape analysis via
3-valued logic. In POPL 99: Principles of Programming Languages,
pages 105­118. ACM Press, 1999.
126