Information and Computation 204 (2006) 1526­1574 www.elsevier.com/locate/ic Decision procedures for term algebras with integer constraints Ting Zhang *, Henny B. Sipma, Zohar Manna Computer Science Department, Stanford University, USA Received 1 January 2005; revised 30 May 2005 Available online 25 July 2006 Abstract Term algebras can model recursive data structures which are widely used in programming languages. To verify programs we must be able to reason about these structures. However, as programming languages often involve multiple data domains, in program verification decision procedures for a single theory are usually not applicable. An important class of mixed constraints consists of combinations of data structures with integer constraints on the size of data structures. Such constraints can express memory safety properties such as ab- sence of memory overflow and out-of-bound array access, which are crucial for program correctness. In this paper we extend the theory of term algebras with the length function which maps a term to its size, resulting in a combined theory of term algebras and Presburger arithmetic. This arithmetic extension provides a natural but tight coupling between the two theories, and hence the general purpose combination methods like Nelson-Op- pen combination are not applicable. We present decision procedures for quantifier-free theories in structures with an infinite constant domain and with a finite constant domain. We also present a quantifier elimination procedure for the extended first-order theory that can remove a block of existential quantifiers in one step. 2006 Elsevier Inc. All rights reserved Keywords: Decision procedures; Recursive data structures; Term algebras; Presburger arithmetic; Quantifier elimination; Combination of satisfiability procedures This research was supported in part by NSF Grants CCR-01-21403, CCR-02-20134, CCR-02-09237, CNS-0411363, and CCF-0430102, by ARO Grant DAAD19-01-1-0723, and by NAVY/ONR contract N00014-03-1-0939. Corresponding author. E-mail addresses: tingz@cs.stanford.edu (T. Zhang), sipma@cs.stanford.edu (H.B. Sipma), zm@cs.stanford.edu (Z. Manna). 0890-5401/$ - see front matter 2006 Elsevier Inc. All rights reserved doi:10.1016/j.ic.2006.03.004 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1527 1. Introduction Recursively defined data structures are essential constructs in programming languages. Intuitively, a data structure is recursively defined if it is partially composed of smaller or simpler instances of the same structure. Examples include lists, stacks, counters, trees, records and queues. To verify programs containing recursively defined data structures we must be able to reason about these data structures. Decision procedures for several data structures exist. However, in program verification decision procedures for a single theory are usually not applicable as programming lan- guages often involve multiple data domains, resulting in verification conditions that span multiple theories. Common examples of such mixed constraints are combinations of data structures with integer constraints on the size of those structures. Such constraints can express memory safety properties such as absence of memory overflow and out-of-bound array access, which are crucial to program correctness. In this paper we consider the integration of Presburger arithmetic with term algebras which can represent an important class of recursively defined data structures known as recursive data struc- tures. This class of structures satisfies the following two properties of term algebras: (i) the data domain is the set of data objects generated exclusively by applying constructors, and (ii) each data object is uniquely generated. Examples of such structures include lists, stacks, counters, trees and records; queues do not belong to this class as they are not uniquely generated: they can grow at both ends. Our language of the integrated theory has two sorts; the integer sort and the term sort . The language is the set-theoretic union of the language of term algebras and the language of Presburger arithmetic plus the additional length function | | : . Formulas are formed from term literals and integer literals in the usual way. Term literals are exactly the literals in the theory of term alge- bras. Integer literals are those that can be built up from primitive integer terms (the length function applied to -terms), addition and the other usual arithmetic functions and relations. We present decision procedures for the quantifier-free and the first-order theory of term algebras with length function and integer constraints, for structures with both finite and infinite constant domain. In the rest of the paper we will use the following notation for these theories. Th (TA) and Th (TA ) denote the quantifier-free theory of, respectively, pure term algebras and term algebras with a length function and Presburger arithmetic constraints. Similarly, Th(TA) and Th(TA ) de- note the full first-order theory of pure term algebras and term algebras with a length function and Presburger arithmetic constraints. When we separately consider decision procedures for structures with infinite constant domain, we add an superscript, for example, Th (TA ). The decision procedures for Th (TA ) are based on Oppen's algorithm for acyclic recursive data structures with infinite data domain (which, essentially, is Th (TA )) [26]. To decide satisfiability of a term constraint , Oppen's procedure constructs a DAG for , extracts from this DAG all implied equalities between terms, and then checks for inconsistencies with disequalities in . We extend this procedure to Th (TA ) by extracting an implied length constraint from the term constraint. We show that for structures with infinite constant domain such a length constraint, which is satisfiable if and only if the term constraint is satisfiable, can be effectively computed and is expressible by a quantifier-free Presburger formula linear in the size of . For structures with finite constant domain we introduce an additional counting constraint to account for the fact that with finitely many constants the number of distinct terms of a particular length is bounded. We show that also 1528 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 this counting constraint is expressible by a quantifier-free Presburger formula. The latter decision procedure directly extends Oppen's decision procedure for infinite data domains to finite domains. For the first-order theory, we first present a new quantifier elimination procedure for Th(TA) and then extend it to an elimination procedure for Th(TA ). Our elimination procedure for Th(TA) is based on the elimination procedure in [13], but can eliminate blocks of quantifiers of the same kind in one step. We extend it to a decision procedure for Th(TA ) by, again, extracting integer constraints from term constraints combined with a reduction of quantifiers on term variables to quantifiers on integer variables. The decision procedures for Th (TA ) and Th(TA ) were first published, without proofs, in [38]. The improved version that allows elimination of blocks of quantifiers was published in [39]. In that paper we showed that the complexity of our decision procedures was 2k-fold exponential for k quantifier alternations for Th(TA ). This paper provides an extended presentation of the results in both papers, improves the complexity of the decision procedure for Th(TA ) to k-fold exponential for k quantifier alternations, and includes all the proofs. Related work and comparison. Our component theories are both decidable. Presburger arithmetic was first shown to be decidable in 1929 by quantifier elimination [10]. A more efficient algorithm was later discovered by Cooper [7] and further improved by Reddy and Loveland [28]. It is well-known that recursive data structures can be modeled as term algebras which were shown to be decidable by Mal'cev using quantifier elimination [23]. This result was proved again several times in different settings [21,6,13,5,2,30,18,19,38]. Quantifier elimination has been used to obtain decidability results for various extensions of term algebras. Maher showed the decidability of the theory of infinite and rational trees [21]. Comon and Delor presented an elimination procedure for term algebras with membership predicate in the regular tree language [5]. Backofen presented an elimination procedure for structures of feature trees with arity constraints [2]. Rybina and Voronkov showed the decidability of term algebras with queues [30]. Kuncak and Rinard showed the decidability of term powers, which are term algebras augmented with coordinate-wise defined predicates [18]. Decision procedures for the quantifier-free theory of recursive data structures were discovered by Nelson, Oppen, Downey, Sethi and Tarjan [24,26,9]. Oppen gave a linear algorithm for acyclic structures [26] and (with Nelson) a quadratic algorithm for cyclic structures [24]. If the values of the selector functions on constants are specified, then the problem is NP-complete [26]. A general combination method for decision procedures for quantifier-free theories was developed by Nelson and Oppen in 1979 [25]. The method requires that component theories be loosely cou- pled, that is, have disjoint signatures, and are stably infinite1 [32]. Tinelli and Ringeissen presented a general theoretical framework for combining satisfiability procedures of theories with non-disjoint signatures [33]. Tinelli and Zarba generalized Nelson­Oppen's method to theories in multisorted languages [34]. Armando et al. presented a uniform framework using superposition for deriving decision procedures for certain combined theories [1]. Ghilardi presented a set of model-theoretical conditions for the existence of Nelson­Oppen combination schema on theories having non-disjoint signatures [12]. However, none of these general purpose combination methods are applicable to the 1 A theory is stably infinite if a quantifier-free formula in the theory is satisfiable if and only if it is satisfiable in an infinite model. T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1529 combination of our component theories, which is a multisorted theory with a function mapping elements in one sort to another. Zarba constructed decision procedures for a combined theory of sets and integers [36] and a theory of multisets and integers [37]. The integration of Presburger arithmetic with recursive data structures was discussed by Bjrner [4] and an incomplete procedure was implemented in STeP (Stanford Temporal Prover) [3]. Integer constraints not only arise in the combination of decision procedures, but they are also useful as an auxiliary extension to encode properties on data structures. This line of investigation goes back to Skolem who showed the decidability of the first-order theory of Boolean algebras by reducing constraints on sets to constraints on the cardinality of sets [31]. It readily follows from the reduction technique that the first order theory of sets with cardinality constraints in Presburg- er arithmetic is decidable [11]. Recently, Revesz [29], and Kuncak and Rinard [17] independently presented decision procedures for this theory. A combination of Presburger arithmetic and term algebras was used by Korovin and Voronkov to show that the quantifier-free theory of term alge- bras with Knuth-Bendix order is NP-complete [15,16]. Along this line of investigation we proved the decidability of the first-order theory of Knuth-Bendix orders [40] using quantifier elimination. The elimination procedure makes extensive use of Presburger arithmetic in the reduction of quantifiers on term variables to quantifiers on integer variables. Paper organization. Section 2 presents the notation and terminology. Section 3 introduces the lan- guage and structure of term algebras. Section 4 describes Oppen's algorithm for recursive data structures. Section 5 presents our decision procedures for the quantifier-free theory of term alge- bras augmented with a length function and Presburger arithmetic. In Section 5.1 we first describe the theory and then in Section 5.2 we outline our approach for constructing the decision proce- dures by introducing the concepts of implied length constraints and presenting a generic decision procedure. In Section 5.3 we specialize this procedure for structures with infinite constant domain, and in Section 5.4 we refine it further for structures with finite constant domain. Section 5.5 out- lines the approach to obtain decision procedures for structures whose constant domain contains relations besides equality. Section 5.6 discusses the complexity of the decision procedures for the quantifier-free theories. Section 6 presents a new decision procedure for the first-order theory of term algebras. Section 7 presents the decision procedures for the first-order theory of term algebras with integer constraints. We first introduce the technical machinery for the construction of a quantifier elimination proce- dure and then, in Section 7.2, we extend the elimination procedure for term algebras presented in Section 6 to term algebras with integers. Section 7.3 further generalizes the result to structures whose constant domain has an internal structure and admits quantifier elimination. In Section 7.4 we dis- cuss how this procedure can be adapted for theories with infinite languages. Section 8 concludes with some ideas for future work. Most proofs are provided in the Appendix. 2. Preliminaries We assume the first-order syntactic notions of variables, parameters and quantifiers, and semantic notions of structures, satisfiability and validity as in [10]. 1530 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 A signature is a set of function symbols and predicate symbols each of which is associated with an arity. The function symbols with arity 0 are also called constants. The set of -terms T ( , X) is recursively defined by: (i) every constant c or variable x X is a term, and (ii) if f is an n-place function symbol and t1, . . . , tn are terms, then f(t1, . . . , tn) is a term. Equality = is always included as a binary predicate symbol. If is a formula, we use T () to denote the set of terms occurring in , V() to denote the set of variables in . An atomic formula is a formula of the form P(t1, . . . , tn) where P is an n-place predicate symbol and t1, . . . , tn are terms. A literal is an atomic formula or its negation. A variable occurs free in a formula if it is not in the scope of a quantifier. A formula without quantifiers is called quantifier-free. A ground formula is a formula with no variables. A sentence is a formula in which no variable occurs free. Every quantifier-free formula can be put into disjunctive normal form, that is, a disjunction of conjunctions of literals. A formula ( x) can be put into prenex form Q1y1, . . . , Qnyn ( x, y1, . . . , yn), where Qi's are either or and ( x, y1, . . . , yn) is quantifier-free, called the matrix of . A -structure A is a tuple A, I where A is a non-empty domain and I is a function that associ- ates each n-place function symbol f (respectively, predicate symbol P) with an n-place function f A (respectively, relation PA ) on A. We use Gothic letters (like A) for structures and Roman letters (like A) for the underlying domain. We usually denote A by A; which is called the signature of A. A variable assignment (in A) is a function that assigns each variable an element of A. We use [[x]] to denote the assigned value of x under and [[]] for the truth value of under . A |= [[]] means is true under . A formula is satisfiable (in A), denoted by A |= , if A |= [[]] for some ; is unsatisfiable (in A), denoted by A |= , if A |= [[]] for no ; is valid (in A), denoted by A |= , if A |= [[]] for any . A formula is valid if and only if is unsatisfiable. A is a model of a set T of sentences if every sentence in T is true in A. A sentence is (logically) implied by T (or T -valid), written T |= , if is true in every model of T . Similarly, we say that is T -satisfiable if is true in some model of T and it is T -unsatisfiable otherwise. The notions of (T -)validity and (T -)satisfiability naturally extend to a set of formulas. A theory T is a set of sentences that is closed under logical implication, that is, if T |= , then T . The theory of A, written Th(A), is the set of all true sentences in A. By a quantifier-free theory of A, written Th (A), we mean the quantifier-free subclass of Th(A). We use x to denote a sequence of variables, say, x1, . . . , xn, and x (respectively, x) as an abbrevi- ation of x1, . . . , xn (respectively, x1, . . . , xn). We write ( x) to indicate that all variables occurring freely in are among x. ( x)( x) and ( x)( x) are called existential closure and universal closure of ( x), respectively. If ( x) is quantifier-free, then ( x)( x) is called 1 and ( x)( x) is called 1. The quantifier-free (respectively, 1, 1) fragment of a language is the subclass of quantifier-free (respectively, 1, 1) sentences in the language. By satisfiability and validity of a quantifier-free for- mula, we actually mean the validity of the corresponding 1 and 1 formulas, respectively. Similarly, the satisfiability problem and the validity problem of a quantifier-free theory (or more precisely, the quantifier-free fragment), respectively, refer to the validity problem of the corresponding 1 fragment and 1 fragment. A theory T is said to admit quantifier elimination if any formula can be equivalently (modulo T ) and effectively transformed into a quantifier-free formula. If a theory admits quantifier elimination, then the truth value of any sentence is reducible to the truth value of a ground formula. All above notions naturally generalize to multi-sorted logics. In a multi-sorted logic, we have a non-empty set S of sorts. Variables, constants, equality symbols and quantifiers are indexed by T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1531 s S. A n-ary function symbol f is associated with a n + 1-tuple s1, . . . , sn+1 , called the type of f . Similarly, the type of a n-ary predicate symbol p is a n-tuple s1, . . . , sn . For 1 i n, we say that the ith-place of f (or p) has sort si. We require the type of the equality symbol of sort s be s, s . A term t is of sort s if (i) t is a variable or a constant of sort s, or (ii) t is of the form f(t1, . . . , tn) such that the type of f is s1, . . . , sn, s , and t1, . . . , tn are of sorts s1, . . . , sn, respectively. A formula is well-formed if in addition it is well-typed in the sense that (i) a term of sort s only occurs in a place of sort s in a function or predicate symbol, and (ii) variables of sort s are only quantified by s and s. A multi-sorted structure A is a tuple {A}S, S, I where S is the set of sorts, {A}S are mutually disjoint sets (domains) indexed by S, and I is an interpretation such that (i) each n-ary function symbol f with type s1, . . . , sn+1 is assigned a function F : As1 × × Asn Asn+1 ; (ii) each n-ary predicate symbol p with type s1, . . . , sn is assigned a relation P As1 × × Asn . A variable as- signment assigns a variable of sort s an element in As. Satisfiability, unsatisfiability and validity are defined as above with s (respectively, s) being interpreted as "for all (respectively, some) elements in the domain As". Presburger arithmetic is the first-order theory of addition in the arithmetic of integers. The corre- sponding structure is denoted by PA = ; 0, +, < . We use L to denote the formal language of PA. We use for syntactic equality. We use interval notation for integer sets. For example, the closed interval [m, n] means {i | m i n}. Similarly for open intervals (m..n) and half-open intervals [m..n) and (m..n]. 3. Term algebras We present a general language and structure of term algebras. For simplicity, we do not dis- tinguish syntactic terms in the language from semantic terms in the corresponding structure. The meaning should be clear from the context. Definition 1. A term algebra TA : ; C, A, S, T consists of (1) : The term domain, which exclusively consists of terms recursively built up from constants by applying non-nullary constructors. Objects in are called TA-terms. The type of a term t, denoted by type(t), is the outermost constructor symbol of t. We say that t is -typed (or is an -term) if type(t) = . (2) C: A set of constructors: , , , . . . The arity of is denoted by ar( ). (3) A: A set of constants: a, b, c, . . . We require A = and A C. For a A, ar(a) = 0 and type(a) = a. (4) S: A set of selectors. For a constructor with arity k > 0, there are k selectors s1 , . . . , sk in S. We call si (1 i k) the ith -selector. For a term x, si (x) returns the ith component of x if x is an -term and x itself otherwise. (5) T : A set of testers. For each constructor there is a corresponding tester Is . For a term x, Is (x) is true if and only if x is an -term. For a constant a, Isa(x) is just x = a. In addition there is a special tester IsA such that IsA(x) is true if and only if x is a constant. We use L to denote the language of term algebras. 1532 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 A term algebra has two essential properties: (i) the domain is exclusively generated by recur- sively applying constructors; (ii) each object in is uniquely constructed. Note that the term domain is the (ground) Herbrand domain in the language consisting of only constructors. Selectors and testers are introduced into the formal language for our study. A more general definition of term algebras would include a variable base X. For our purpose of modeling recursive data structures, however, it suffices to assume the term domain only consists of ground terms; i.e., the base X = . Nevertheless, our method can be modified to deal with a non-ground term domain by treating variables as special constants. For simplicity, in the rest of this paper we assume that L is finite except in Section 5.3 where we present an algorithm for structures with an infinite constant domain as the basis for algorithms for structures with a finite constant domain. The main techniques used for finite languages, however, can be easily generalized to handle the case of infinite languages. Actually, the decision problems become considerably easier if we allow L to have infinitely many constants. We defer the detailed discussion to Section 7.4. Example 2. Consider the LISP list structure List = list; {cons, nil}, {nil}, {car, cdr}, {Iscons, Isnil, IsA} where list denotes the domain, nil denotes the empty list, cons is a binary constructor (pairing func- tion) and car and cdr are the corresponding left and right selectors (projectors), respectively. It is not difficult to verify that List is an instance of term algebras. The theory of term algebras is axiomatizable. Let z denote z1, . . . , zar( ). The following formula schemes, in which variables are implicitly universally quantified over , axiomatize Th(TA). A1. t(x) = x, if t is built solely by constructors and t properly contains x. A2. (x1, . . . , xar( )) = (y1, . . . , yar()), if , C and . A3. (x1, . . . , xar( )) = (y1, . . . , yar( )) 1 i ar( ) xi = yi. A4. Is (x) z (z ) = x for C. In particular, Isa(x) x = a for a A. A5. IsA(x) aA Isa(x). A6. si (x) = y z ( (z ) = x y = zi)) (z ( (z ) = x) x = y) . This set of axioms is a variant of the axiomatization given in [13]. In general, selectors and testers can be defined by constructors and vice versa. One direction has been shown by (A4), (A5) and (A6), which are purely definitional axioms. The other direction follows from the equivalence of k i=1 si (x) = xi Is (x) and x = (x1, . . . , xk). We write = (s1 , . . . , sk ) to indicate that is a non-nullary constructor with ar( ) = k and s1 , . . . , sk are the corresponding selectors of . A term t is called a constructor term if t is a variable or the outermost function symbol of t is a constructor. A constructor term not containing selectors is called pure. For example, constants are pure constructor terms. A term t is called a selector term if either t is a variable or the outermost function symbol of t is a selector. Note that variables are both constructor terms and selector terms. We assume that no constructors appear immediately inside selectors as simplification can always be done. For example, si ( (x1, . . . , xk)) simplifies to xi (1 i k) and s j ( (x1, . . . , xk)) simplifies to (x1, . . . , xk) for . As a consequence, a selector T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1533 term has the form s1(. . . (sn(x) . . .)) for n 0. We use L, F , G, H, . . . to denote (possibly empty) se- lector sequences. So s1(. . . (sn(x) . . .)) can be abbreviated as Lx for L = s1, . . . , sn. The depth of x in Lx is |L|, the length of L. The depth of x in a formula is the maximum depth of x in the selector terms in , denoted by depth(x). We use Is (t1, . . . , tn) as an abbreviation for n i=1 Is (t). We say a selector term si (t) is proper in a formula if Is (t) is a conjunct of . We can make selector terms proper with type information. Definition 3 (Type completeness). A conjunction of literals is type complete if for any selector term t occurring in , exactly one type of tester predicate Is (t) ( C \ A {A}) is a conjunct of . For a type complete containing a term t, we write type(t) = to indicate that Is (t) is a conjunct of . A type complete can be simplified so that any selector term is proper. Example 4. Let us consider in List the type complete constraint y = cons(x, car(x)) IsA(x, car(x)) Iscons(y). (1) Thanks to the type information, it can be simplified to y = cons(x, x) IsA(x) Iscons(y). (2) We could have defined the notion of type completeness only for terms that occur inside selectors. In this way, a type complete formula may contain terms of unspecified types; they are either variables or selector terms that are not embedded inside selectors. This will lead to more efficient algorithms in practice. We choose the above definition, however, because it simplifies descriptions of algorithms presented in the following sections, and in addition, it does not affect the worst-case complexity for those algorithms. Given a quantifier-free formula , we can obtain a type complete from by adding exactly one tester predicate Is (t) ( C) for each term t occurring in . We call so obtained a type completion of . Example 5.LetusrevisitExample4.Itiseasilyseenthat(1)isatypecompletionofy = cons(x, car(x)). Example 6. Let , C, and = (s1 ). A possible type completion for y = s1 (x) is y = s1 (x) Is(x, s1 (x), y), (3) which, by Axioms (A4) and (A6), simplifies to y = x Is(x, y). Another type completion is y = s1 (x) Is (x) Is(s1 (x), y), (4) in which the selector term s1 (x) is proper and no simplification is possible. As a third possibility, we can have the type completion y = s1 (x) Is (x, s1 (x)) Is(y), (5) which simplifies to false because of type conflicts. As shown by the above example, a type completion of a satisfiable formula may be contradictory due to type conflicts. A type completion of is compatible with if the satisfiability of implies 1534 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 the satisfiability of . Obviously, is satisfiable if and only if it has a satisfiable compatible type completion. In the following sections, we present nondeterministic algorithms that rely on the successful guess of a satisfiable compatible type completion. Unless stated otherwise, we assume that any quantifi- er-free formula is type complete, and all occurring selector terms are simplified to proper ones. We assume that equalities (respectively, disequalities) between terms with conflicting types are simpli- fied to false (respectively, true). For example, in List the appearance of car(x) = y should be read as car(x) = y Iscons(x). We also assume that formulas do not have conflicting type literals. For example, we never encounter formulas containing subformulas of the form x = (t1, . . . , tar( )) x = (t1, . . . , tar()) for , because at least one conjunct would have been simplified to true. But to save notation, we omit test literals and treat them as implicit side conditions. 4. Decision procedures for Th(TA) This section and the next section present decision procedures for quantifier-free theories. All our decision procedures for quantifier-free theories are refutation-based; to determine the validity of a formula , a procedure determines the unsatisfiability of , which further reduces to determining the unsatisfiability of each disjunct in the disjunctive normal form of . Henceforth, in discussions related to quantifier-free theories, a quantifier-free formula (or an input formula) always refers to a conjunction of literals. To emphasize this, we use capital symbols , , . . . to denote conjunctions of literals. We identify a conjunction of literals with the set of all its conjuncts. By A we mean A is a conjunct of and by A (or more formally {A}) we mean A. We present algorithms in a nondeterministic manner; by saying "guess" , we mean to split on a disjunction that is valid in the context and contains as one of the disjuncts, and we work on each resultant constraint "simultaneously". In [26] Oppen presented a decision procedure for the quantifier-free theory of acyclic recursive data structures which is essentially Th (TA). The basic idea of the decision procedure is to generate all equalities implied by the input formula and check for inconsistencies with disequalities given in the input. The decision procedure relies on the fact that Th (TA) is convex in a language without selectors. In fact, the convexity was shown implicitly in the correctness proof of Oppen's algorithm [26]. Definition 7 (Convexity). A theory is convex if whenever a conjunction of literals implies a disjunction of atomic formulas, it also implies one of the disjuncts. Let be a conjunction of equalities and a disjunction of equalities. The convexity of Th (TA) can be rephrased as follows: is satisfiable if and only if for each of conjuncts s = t , s = t is satisfiable, i.e., |= s = t. The idea of Oppen's algorithm is to discover all logically implied equalities (between terms in and ) by constructing a directed acyclic graph (DAG) with terms as vertexes and computing an equivalence relation on the nodes based on equality of children and ancestor nodes (bidirectional closure), formally described below. T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1535 Fig. 1. The DAG of (6). Definition 8 (DAG representation of terms). A term t can be naturally represented by an ordered tree Tt such that (i) if t is a constant, then Tt is a leaf vertex labeled by t, and (ii) if t is in the form (t1, . . . , tk), then Tt is the tree with root labeled by and Tt1 , . . . , Ttk as its subtrees. A directed acyclic graph (DAG) Gt of t is obtained from Tt by merging all identical subgraphs of Tt. The DAG of a formula is the DAG representing all terms in the formula. For example, Fig. 1 shows the DAG for cons(y, z) = cons(x, z) cons(x, y) = z. (6) We assume DAGs are sibling complete in the sense that a node and all of its siblings coexist. For example, car(x) appears if and only if cdr(x) does. A sibling completion can be easily obtained by adding trivial equality literals like t = t to the original formula. For a vertex u, let (u) denote the outgoing degree and u[i] (1 i (u)) the ith successor of u. Let R be a binary relation on the vertexes of a DAG and let u, v be any two vertexes. Definition 9 (Unification closure). We say that R is the unification closure of R (denoted by R) if R is the smallest equivalence relation extending R such that (u, v) R and type(u) = type(v) implies (u[i], v[i]) R , for every i [1.. (u)]. Definition 10 (Congruence closure). We say that R is the congruence closure of R (denoted by R) if R is the smallest equivalence relation extending R such that (u[i], v[i]) R (for every i [1.. (u)]) and type(u) = type(v) implies (u, v) R . If R is both unification and congruence closed (with respect to R), we call it the bidirectional closure, denoted by R . Let R be the set of all pairs asserted equal in . It was shown that R represents all equalities logically implied by [26]. Therefore, is unsatisfiable if and only if there exist t and s such that t = s and (t, s) R . Algorithm 1 (Oppen's decision procedure for Th (TA) [26]). Input: : q1 = r1 . . . qk = rk s1 = t1 . . . sl = tl, where qi, ri, si and ti are pure constructor terms. (1) Construct the DAG G of . (2) Compute on G the bidirectional closure R of R = {(qi, ri) | 1 i k}. (3) Return FAIL if i(si, ti) R , or ((u, v) R )[type(u) = type(v)]. Return SUCCESS otherwise. 1536 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 Oppen's original algorithm is given for the theory of LISP lists, which only has one non-nullary constructor (cons). It is straightforward, however, to generalize it to term algebras with an arbitrary number of non-nullary constructors; in Algorithm 1, we added type checking which is not present in Oppen's original algorithm. In our setting, the language contains selectors and values of -selectors on non -terms are spec- ified, e.g., si (x) = x if x is not an -term. It was shown that for such structures the decision problem is NP-complete [24]. The complication is that it is not known a priori whether s(x) is a proper subterm of x and hence it is not possible to use the DAG representation directly. A solution to this problem is to guess the type information of terms occurring inside selectors before applying Algorithm 1. Algorithm 2 (Decision procedure for Th (TA) with selectors). Input: , a conjunction of equalities and disequalities. (1) Guess a type completion c of and simplify selector terms accordingly. (2) Call Algorithm 1 on c. Example 11. Fig. 2 shows the DAG representation of the LISP list formula cons(y, z) = cons(cdr(x), z) cons(car(x), y) = x (7) under the guess that x is not a constant. Initially R = {(v3, v4)} as v3 and v4 are asserted equal in (7). (For simplicity reflexive pairs are not listed.) By the unification algorithm (v6, v7) are merged, which gives R = {(v3, v4), (v6, v7)}. Then by the congruence algorithm (v1, v2) are merged, resulting in R = { (v1, v2), (v3, v4), (v6, v7) }. (Here we used the property that R = (R ) .) Obviously this branch fails as v1 = v2 is asserted by (7). The remaining branch (with presence of IsA(x)) simplifies to IsA(x) x = y which is clearly satisfiable, and therefore so is (7). Note that the correctness of both Algorithm 1 and 2 relies on the (implicit) assumption that the constant domain is infinite, since otherwise the theory is not convex. As a counter-example, for the structure TA with domain A = {a, b}, we have IsA(x) |= x = a x = b, but neither IsA(x) |= x = a nor IsA(x) |= x = b. We shall see (in Section 5.4) that our algorithm extends Oppen's algorithm to structures with finite constant domain. Fig. 2. The DAG of (7) under the assumption Iscons(x). T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1537 5. Decision procedures for quantifier-free theories In this section we present decision procedures for Th (TA ), the quantifier-free theory of term algebras augmented with a length function and Presburger arithmetic. First, in Section 5.1 we de- scribe the theory, and then, in Section 5.2, outline our generic approach for constructing decision procedures for this theory. This approach is refined in Section 5.3 into a decision procedure for structures with an infinite constant domain, Th (TA ) and then further refined for structures with finite constant domain, Th (TA ) in Section 5.4. Section 5.5 outlines the approach to obtain deci- sion procedures for structures whose constant domain contains relations besides equality. Section 5.6 discusses the complexity of the decision procedures for the quantifier-free theories. 5.1. Term algebras with integers Definition 12. The structure of term algebras with integers is TA = TA; PA; | | : where TA is a term algebra, PA is Presburger arithmetic, and | | denotes the length function defined recursively by: (i) for any constant a, |a| = 1, and (ii) for a term (t1, . . . , tk), | (t1, . . . , tk)| = k i=1 |ti| + 1. The extended language is denoted by L . The length function given in Definition 12 was chosen for ease of presentation. Generalizing it into a weight function that assigns an arbitrary nonnegative integer to each symbol, or a height function that gives the length of the maximum path does not require any essential changes to our techniques. We use subscripts , (or prefixes TA-, PA-) to denote notions related to term sort and inte- ger sort, respectively. We also use "term" for "TA" when there is no confusion. For example, denotes a formula in L , the language of pure term algebras, denotes a formula in L , the language of Presburger arithmetic, V denotes the collection of variables in L and V denotes the collection of variables in L . Although L contains two equality predicates: term equality = in L and integer equality = in L , we use = in both cases unless there is a chance of confusion. A TA-term can occur inside the length function. Such occurrence is called an integer occurrence to be distinguished from the normal term occurrence. From now on, we freely use integer terms |t| to form Presburger formulas. For example, car(x) is a TA-term and |car(x)| is a PA-term. The first occurrence of car(x) is a term occurrence and the second one is an integer occurrence. To save space, we use an integer term |t( x)| in two ways; one as the length function | | applied to t( x) (when t( x) is in discussion), and the other as a special integer variable (called pseudo-integer variable). In the latter case, if x V , then | x| = z denotes the formula of the form i |ti( x)| = zi where z V and ti(x) enumerates all TA-terms containing x in the context. Example 13. Consider the formula ( x) : x1 = cons(x2, x3). In the context of this formula | x| = z denotes |x1| = z1 |x2| = z2 |x3| = z3 |cons(x2, x3)| = z4. 1538 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 Example 14. In the context of the formula ( x) : |x1| = |car(cdr(x2))|, | x| = z denotes |x1| = z1 |x2| = z2 |cdr(x2)| = z3 |car(cdr(x2))| = z4. Suppose ( x, y) is in a context in which all occurrences of x V are integer occurrences. Then by (z, y) we mean the formula obtained by substituting a true integer variable z (z z) for each pseudo-integer variable |t( x)|. (Here we actually overload the symbol , but the risk of confusion is minimal.) We use | x| z to denote such a substitution. For example, in Example 14 (z) denotes z1 = z4. If is an assignment for V , then | | denotes the corresponding assignment for pseudo-in- teger variables. For example, if is x1 = cons(x2, x3) and is {x1 := nil, x2 := cons(nil, nil), x3 := nil}, then | | is {|x1| := 1, |x2| := 3, |x3| := 1}. It is easily seen that the general purpose combination method in [25] is not directly applicable to TA due to the presence of the length function. Example 15. The constraints list : x = cons(car(y), y), : |x| < 2|car(x)| are clearly satisfiable, respectively, in List and PA. However, since list implies that car(x) = car(y), x contains two copies of car(y) and so its length should be at least two times the length of car(x). Therefore, list is unsatisfiable in List . A simple but crucial observation is that induces an implicit length constraint , in addi- tion to the explicit constraint given in the input. The unsatisfiability is due to the fact that contradicts . In Example 15, list in fact implies |x| 2|car(x)|, resulting in a contradiction to . Implicit length constraints are induced not only by the structures of objects, but also by the size of the constant domain. Example 16. Consider the constraint list : x = cons(cons(nil, nil), nil) x = cons(nil, cons(nil, nil)), : |x| = 5 in List and in PA, respectively. Clearly, both are satisfiable in the respective structures. In the com- bined structure List , however, there are exactly two term trees with length 5 and list states that x is not equal to either of them. As a consequence, list implies : |x| = 5, contradicting |x| = 5. Intuitively, if we can extract from the implicit that exactly characterizes the solution set of , then the satisfiability of reduces to the satisfiability of . As a consequence, we can derive decision procedures for the combined theory by utilizing the decision procedures for PA and TA. 5.2. A generic decision procedure for Th (TA ) Given a term constraint our objective is to construct a Presburger formula , called a Length Constraint Completion, that is satisfiable if and only if is satisfiable. T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1539 Definition 17 (Length constraint completion (LCC)). An L -formula ( x) is a length constraint completion (LCC) for ( x) if the following formulas are valid: ( x : ) ( x) (z : ) (z) | x| = z , (8) (z : ) (z) ( x : ) ( x) | x| = z . (9) Let be an LCC for . Condition (8) says that for any satisfying assignment of , | | is a satisfying assignment for . In other words, | | maps a satisfying assignment for in TA to a satisfying assignment for in PA. We say that satisfying (8) is sound with respect to . On the other hand, condition (9) says that for any satisfying assignment of there exists a satisfying assignment of such that | | = . In other words, any satisfying assignment in PA is the image under | | of a satisfying assignment in TA. We say that satisfying (9) is realizable by . In particular, if is unsatisfiable, then so is . Let be a formula satisfying both (8) and (9). Let + and -, respectively, be any two formulas satisfying (8) and (9) (when in place of ). If we identify these constraints with their corresponding solution sets, we have - +. (10) Thus is the exact projection of from TA to PA, while +, - are over and under approxi- mations of respectively. Let and both be LLCs for . By (10) we have and hence = (with respect to the corresponding solution sets). Therefore, for a term con- straint there exists a unique LCC up to equivalence. Example 18. Consider in List the formulas list : cons(x, y) = z. The constraint + : |z| > |x| |z| > |y| |x| > 0 |y| > 0 2 |x| 2 |y| is sound but it is not realizable for list, as the integer assignment : { |x| := 3, |y| := 3, |z| := 4 } cannot be realized. On the other hand, the constraint - : |x| + |y| + 1 = |z| |x| > 5 |y| > 0 2 |x| 2 |y| is realizable for , but it is not sound because it is not satisfied by the data assignment : { x := nil, y := nil, z := cons(nil, nil) }. Finally, the constraint : |x| + |y| + 1 = |z| |x| > 0 |y| > 0 2 |x| 2 |y| is both sound and realizable, and hence is the induced length constraint of list. We have a decision procedure for Th (TA ) if can be effectively computed from . 1540 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 Theorem 19 ([38]). Let be an LCC for . Then TA |= if and only if PA |= . Proof. Conditions (8) and (9) give the "" and "" directions, respectively. By this theorem the decision problem for quantifier-free theories reduces to computing the LCC in Presburger arithmetic. To obtain an LCC, we need a normalization process to transform to an equivalent disjunction in which each disjunct is of the form  . We call such a transformation a partitioning and each disjunct a partition. (We do not require partitions to be mutually exclusive.) In the subsequent sections, we shall show in detail each of the normalization procedures. First, we extend Definition 17 to deal with newly generated integer constraints in the normalization. Definition 20 (Relativized LCC (RLCC)). A formula ( x) is a length constraint completion for ( x) relativized to  ( x), (in short, ( x) is an RLCC for ( x)/ ( x)), if the following formulas are valid: ( x : ) ( x)  ( x) (z : ) (z) | x| = z , (11) (z : ) (z) ( x : ) ( x)  ( x) | x| = z . (12) Example 21. Consider List. Let x : {x1, x2, x3}, ( x) : cons(x1, x2) = x3,  ( x) : |x1| < |x2|. Consider the following formulas: div : |x1| > 0 |x2| > 0 2 |x1| 2 |x2|, : |x1| + |x2| + 1 = |x3| |x1| < |x2| div, + : |x1| < |x3| |x2| < |x3| |x1| < |x2| div, - : |x1| + |x2| + 1 = |x3| |x1| 3 |x2| > 3 div. It is not hard to prove that is an RLCC for ( x)/ ( x). However, neither + nor - is such an RLCC. Although + satisfies (11), it does not satisfy (12), as the assignment { |x1| := 2, |x2| := 3, |x3| = 4 } cannot be realized by any assignment for x. On the other hand, - satisfies (12), but not (11), as the assignment { x1 := nil, x2 := cons(nil, nil), x3 := cons(nil, cons(nil, nil)) } falsifies -. Comparing (8) and (9) with (11) and (12), we see that an LCC is an RLCC with  true. Like LCCs, up to equivalence, there exists a unique RLCC with respect to ( x)/ ( x). In addition, RLCCs have the following easily proved "additive" property. T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1541 Proposition 22. If is an RLCC for / , then for any  ,  is an RLCC for /(  ). Inparticular,byletting := and := true,weseethatif isanLCCfor ,then is an RLCC for / . So Theorem 19 is in fact a special case of the following theorem. Theorem 23. Let be an RLCC for / . Then TA |=  if and only if PA |= . Proof. Conditions (11) and (12) give the "" and "" directions, respectively. This theorem motivates the strategy of our decision procedures. In the normalization process, with introduction of auxiliary integer constraints, we partition the original search space for such that i (i)  (i) , until we easily compute the RLCC (i) for each (i) / (i) . By Proposition 22, (i) is an RLCC for (i) /( (i) ). Then TA |= if and only if for some i, TA |= (i)  (i) ,which,byTheorem23(set := (i) , := (i) , :=  (i) ), reduces to determining whether PA |= (i) . Note that is not involved in computing an RLCC. Therefore, we can assume that  (i) includes constraints relevant to the corresponding par- titioning and other constraints generated during the normalization procedure have been merged into . Algorithm 3 (Generic decision procedure). Input: . (1) Return FAIL if TA |= . (2) For each partition (i)  (i) of : (a) Compute an RLCC (i) for (i) / (i) . (b) Return SUCCESS if PA |= (i) . (3) Return FAIL. 5.3. A decision procedure for Th (TA ) The easiest arithmetic extension of term algebras is Th (TA ), the quantifier-free theory of term algebras with integers and with an infinite constant domain. In TA , an LCC can be derived directly from the DAG for the formula. (We do not need the notion of RLCC in this case.) Be- fore we present the algorithm we define the following integer predicates on lengths of terms in a DAG: Tree(x) : x1, . . . , xn 0 x = 1 + n i=1 dixi , (13) Node (x, x ) : x = 1 + ar( ) i=1 xi, (14) Tree (x) : x Node (x, x ) ar( ) i=1 Tree(xi) , (15) 1542 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 where x denotes x1, . . . , xar( ) and d1, . . . , dn are the distinct arities of non-nullary construc- tors. The predicate Tree(x) is true if and only if x is the length of a well-formed tree, since whenever a leaf expands one level with outgoing degree d, the length of the tree increases by d. The second predicate expresses that the length of an -typed node with known children is the sum of its children's lengths plus 1. The last predicate states the length constraint for an -typed tree. With these predicates the construction of an LCC is given by the following algorithm. Algorithm 4 (Computation of LCC in TA ). Let be a type-complete term constraint, G the DAG of and R the bidirectional closure obtained by Algorithm 1. Initially set = . For each term t add the following to . ˇ |t| = 1, if t is a constant or asserted to be a constant (i.e., IsA(t) is in ); ˇ |t| = |s|, if (t, s) R ; ˇ Node (|t|, |t1|, . . . , |tar( )|), if t is an -typed vertex with children t1, . . . , tar( ); ˇ Tree (|t|), if t is an -typed leaf vertex. Proposition 24. obtained by Algorithm 4 is expressible in a quantifier-free Presburger formula linear in the size of . Theorem 25. obtained by Algorithm 4 is an LCC for . Algorithm 5 (Decision procedure for Th (TA )). Input: where is type-complete. (1) Call Algorithm 1 on ; return FAIL if TA |= . (2) Construct from the DAG G using Algorithm 4. ˇ Return SUCCESS if PA |= . ˇ Return FAIL otherwise. The correctness of the algorithm follows from Theorem 19 and Theorem 25. Example 26. Fig. 3 shows the DAG of x = cons(car(y), y) |cons(car(y), y)| < 2|car(x)|, (16) assuming that y is not a constant. The computed R is { (v1, v2), (v3, v5), (v4, v6) }. Fig. 3. The DAG of (16) under the assumption Iscons(y). T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1543 By Algorithm 4 is |x| = |cons(car(y), y)| |car(x)| = |car(y)| |cdr(x)| = |y| |x| = |car(x)| + |cdr(x)| + 1 |y| = |car(y)| + |cdr(y)| + 1 |cons(car(y), y)| = |car(y)| + |y| + 1 2 |car(x)| 2 |cdr(x)| 2 |car(y)| 2 |cdr(y)| |car(x)| 1 |cdr(x)| 1 |car(y)| 1 |cdr(y)| 1 (17) which implies |cons(car(y), y)| 2|car(x)|. (18) contradicting |cons(car(y), y)| < 2|car(x)|. If y is a constant, v3, v6, v7 are merged. In this case also implies (18), and therefore (16) is unsatisfiable. Note that the last two lines of (17) are the result of simplification of constraints of the form Tree(); according to our definition of the length function, the length of any term (tree) in List is a positive odd number. 5.4. A decision procedure for Th (TA ) Algorithm 4 can produce an incorrect LCC in Th (TA ), the quantifier-free theory of term alge- bras with integers and with a finite constant domain, as illustrated by the following example. Example 27. Consider List with A = {nil}. The constraint |x| = 5 IsA(y) x = cons(cons(y, y), y) x = cons(y, cons(y, y)) (19) is unsatisfiable while obtained by Algorithm 4 is |y| = 1 |cons(y, y)| = 3 |cons(cons(y, y), y)| = 5 |cons(y, cons(y, y)| = 5 which is obviously satisfiable together with |x| = 5. The reason is that if A is finite, then there are only finitely many terms of length n for any n > 0. If a term t is forced to be distinct from all of them, then t cannot have length n. Therefore needs to include constraints that count the number of distinct terms of a certain length. Definition 28 (Counting constraint). A counting constraint is a predicate CNTk,n(x) (k > 0, n 0) that is true if and only if there are at least n+1 different -terms of length x in TA with |A| = k. CNTk,n(x) is similarly defined with -terms replaced by TA-terms. Example 29. For List with A = {nil}, CNTcons 1,n (x) is x 2m - 1 2 m where m is the least number such that the m-th Catalan number 1544 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 Cm = 1 m 2m - 2 m - 1 is greater than n. This is not surprising as Cm gives the number of binary trees with m leaves (that tree has 2m - 1 nodes). The following two monotonicity properties are easily proven: for any l k > 0 and m n > 0, CNTk,n(x) CNTl,n(x), CNTk,m(x) CNTk,n(x). In general we have the following result. Proposition 30. CNTk,n(x) and CNTk,n(x) are expressible by quantifier-free Presburger formulas that can be computed in time O(n). In order to construct counting constraints, we need equality information between terms. Definition 31 (Equality completeness).  is equality complete if for any two terms u and v in ˇ either u = v or u = v (but not both) is in , and ˇ either |u| = |v| or |u| = |v| (but not both) is in  . Equality completeness is a syntactical notion similar to a variable partition in the Nelson­ Oppen combination method. We can make a quantifier-free formula  (which does not contain contradictory literals) equality complete by adding exactly one of u = v and u = v to , and exactly one of |u| = |v| and |u| = |v| to  . Let us call the resulting formula an equality comple- tion of  . Similarly, we can define equality completion for sets of terms. Like type completion,  is a compatible equality completion of  if the satisfiability of  implies the satisfiability of  . Example 32. Let list be y = cons(x, z) Iscons(x, y, z). A possible equality completion of list ( = ) is |y| = |cons(x, z)| |x| = |z| |y| = |x| Iscons(x, y, z) t,t S;tt t = t , (20) where S = {x, y, z, cons(x, z)}. Strictly speaking, the formula (20) is not an equality completion of ; to save space, we omitted equalities and disequalities that follow from equality substitution. In general, equality completion could add O(n2) literals. However, it can be more succinctly represented as assertions of the form eq(t1, . . . , tn) or neq(t1, . . . , tn) that state that a set of terms are all equal or all pairwise distinct, respectively. We partition the search space for by computation of equality completion. To save notation, and  always refer to the updated version for one of the partitions. By CLSn(x0, x1, . . . , xn) we denote the conjunction of literals expressing that x0, . . . , xn are -typed terms having the same length but are pairwise distinct. T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1545 Algorithm 6 (Computation of an RLCC in TA ). Input:  (type and equality complete); k, the cardinality of A. (1) Call Algorithm 4 to obtain . (2) Set :=  . (3) Add to the constraints CNTk,n(|t|) if CLSn(t, t1, . . . , tn) is induced by  for some t1, . . . , tn. Note that CNTk,n(|t|) CNTk,m(|t|) for n m. In step (3), therefore, it suffices to add CNTk,n(|t|) only if the set {t, t1, . . . , tn} is maximal in the sense that CLSn(t, t1, . . . , tn) occurs in  , but for no t1, . . . , tn+1 does CLSn+1(t, t1, . . . , tn+1) also occur in  . In addition, due to symmetry, there is no need to add CNTk,m(|t1|), . . ., CNTk,m(|tn|). Proposition 33. obtained by Algorithm 6 is expressible in a quantifier-free Presburger formula of size linear in the size of  . Theorem 34. obtained by Algorithm 6 is an RLCC for / . Algorithm 7 (Decision procedure for Th (TA )). Input : . (1) Guess a type and equality completion of , denoted by  . (2) Call Algorithm 1 on . Return FAIL if TA |= . (3) Construct from using Algorithm 6. ˇ Return SUCCESS if PA |= . ˇ Return FAIL otherwise. The correctness of Algorithm 7 follows from Theorems 23 and 34. Notice that, when is empty, the algorithm can be viewed as an extension of Oppen's original algorithm for structures with a finite constant domain. Example 35. Let us return to Example 27. Constraint (19) has exactly one compatible completion, namely CLScons 2 (x, cons(cons(y, y), y), cons(y, cons(y, y))). This results in the counting constraint CNTcons 1,2 (|x|) : |x| > 5 2 |x|, contradicting |x| = 5 in (19). 5.5. Richer theories on constant domain Up to now we assumed that the constant domain is purely equational, i.e., we can only express equality and disequality between constants. It is fairly easy, however, to relax this assumption and allow a constant domain with richer constructs provided the enriched structure is decidable and the signature of this structure is disjoint with the signature of TA except for constants. (In fact, disjointness is trivially satisfied because in L we only have the equality predicate on constants.) We outline the approach below. 1546 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 Let Ac denote the new constant structure and Lc the corresponding language. Let TA+ denote the extended structure of term algebras with integers and + a constraint in TA+ . Without loss of generality, we assume that both + is equality and type complete with respect to TA-terms. There is a standard way to purify + to c Abs where is a constraint in LTA, c is a constraint in Lc, and Abs is the set of equalities of the form {vi = ti}i where vi are fresh variables and ti are TA-terms in which are of constant type but are not constants directly. Both and c contains the same equality completion (up to the isomorphic mapping {vi ti}i) for terms of constant type. It follows from Nelson­Oppen combination method that TA+ |= + if and only if TA |= and Ac |= c. 5.6. Complexity The complexity of the decision problems for the quantifier-free theories is NP-complete. Let n be the input size. First it is not hard to see that decision problems for both Th (TA ) and Th (TA ) are NP-hard as they are super-theories of Th (TA) and Th (PA), both of which are NP-complete [26], [14, pp. 336­340]. Second, Algorithm 4 computes in O(n) (see Proposition 24) and so does Algorithm 6. Third, the size of any type and equality completion of is bounded by O(n2) as there are at most n2 pairs of terms. By the nondeterministic nature of our algorithms, we see that each branch of computation (in Algorithms 5 and 7, respectively) is in P. Therefore both Th (TA ) and Th(TA ) are NP-complete. 6. A new quantifier elimination procedure for Th(TA) In this section we present a new quantifier elimination algorithm for Th(TA), the first-order theory of term algebras, and show that the algorithm only needs exponential time to eliminate a block of quantifiers of the same kind. The algorithm works mainly in the constructor language while using selectors as auxiliary tools. It is the basis for the elimination procedure for the extended theory presented in the next section. In this section we assume a finite constant domain. We drop the subscript in this section. Normal form. It is well-known that eliminating arbitrary quantifiers reduces to eliminating exis- tential quantifiers from formulas in the form ( x) A1( x, y) An( x, y) , (21) where Ai( x, y) (1 i n) are literals [13]. We can assume that the literals Ai are not of the form x = t when x does not appear inside selectors. For x(x = t (x, y)) simplifies to (t, y) if x does not occur in t, to x (x, y) if t x, and to false by Axiom (A1) if t is a term properly containing x. In all algorithms we assume these simplifications are performed in each step to restore the normal form. Nondeterminism. In the rest of this paper all transformations are done on formulas of the form (21). Again as in the presentation for quantifier-free theories, whenever we say "guess ," we mean T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1547 to add a valid (with respect to the context) disjunction i i (where is one of the disjuncts) to the matrix of (21). When we replace by i i or directly introduce i i, it should be understood that an implicit disjunctive splitting is carried out and we work on each resultant disjunct of the form (21) "simultaneously." Simplification. For simplicity, in the description of algorithms, we omit tester literals unless they are needed for the correctness proof. We may also assume that the matrix of (21) is type complete and basic simplifications are carried out whenever applicable: for a nonempty selector sequence L, we replace Lx = x by true and Lx = x by false; if t(x) is a term properly containing x and x does not appear in selector terms, we replace t(x) = x by true and t(x) = x by false. Notation. In the algorithm we use the following notation: x denote the set of existentially quantified variables; y denote the set of parameters (implicitly universally quantified variables); s, t, u denote TA-terms; L, F , G, H denote (possibly empty) selector sequences; f , g, h, p, q denote index functions with ranges clear from the context; i, j, k, l denote indexes; numerical superscripts are parenthesized. Index functions are used to differentiate multiple occurrences of the same variables. In each step of the transformations the algorithm manipulates the formula ( x) ( x, y) to pro- duce a version of the same form (or multiple versions of the same form in case disjunctions are introduced), and thus in each step ( x) ( x, y) refers to the updated version rather than to the original input formula. Outline. The elimination is performed as follows. A sequence of equivalence-preserving transforma- tions will bring the input formula into a disjunction of formulas in solved form which have solutions under any instantiation of parameters. Therefore, the whole block of existential quantifiers x can be eliminated by removing all literals containing x in the matrix. Definition 36 (Solved form). We say ( x, y) is solved in x, if x do not appear in equalities, are not asserted to be constants and are not inside selector terms. We say ( x) ( x, y) is in solved form if ( x, y) is solved in x. A solved form can be obtained by the following normalization procedure. The normalization can be viewed as an explicit syntactical procedure comparable with the DAG construction and computation of the bidirectional closure. Algorithm 8 (Normalization in TA). Input: ( x) ( x, y). (1) Type Completion. Guess a type completion of ( x, y) and simplify every selector term to a proper one. (2) Selector Elimination. Replace all selector terms containing x by the corresponding equivalent constructor terms according to Axiom (A6). (3) Decomposition. Call Algorithm 9 to decompose equalities and disequalities between construc- tor terms and equalities containing x. (4) Constant Elimination. If some x x is asserted to be a constant (i.e., IsA(x) appears), we instantiate x to each constant to eliminate x since A is finite. 1548 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 After step (1) of Algorithm 8 terms can have three forms: (i) Lx, (ii) Ly, (iii) (t1( x, y), . . . , tar( )( x, y)), where L may be empty. Terms of the form (iii) are constructor terms built recursively from terms in (i)-(ii) using non-nullary constructors with ti( x, y) of the form (i), (ii) or (iii). These three forms give rise to the following six types of equality literals: Lx = L x , (22) Lx = L y, (23) Lx = (t1( x, y), . . . , tar( )( x, y)), (24) Ly = L y , (25) Ly = (t1( x, y), . . . , tar( )( x, y)), (26) (t1( x, y), . . . , tar( )( x, y)) = (t1( x, y), . . . , tar( )( x, y)). (27) Similarly, we have six types of disequalities, the negations of (22)­(27). Step (2) transforms equalities of the forms (22­24), and similarly for disequalities of the same form. Thus after application of this step, we can assume that x does not appear inside selector terms, that is, equality literals have the forms Ly = L y , (28) Ly = (t1( x, y), . . . , tar( )( x, y)), (29) (t1( x, y), . . . , tar( )( x, y)) = (t1( x, y), . . . , tar( )( x, y)), (30) and disequality literals are in the forms of the negations of (28)­(30) and in the forms of x = x , x = Ly, x = (t1( x, y), . . . , tar( )( x, y)). Step (2) may generate literals like x = t( x, y). The reinstatement of normal form, however, does not put any of x inside selector terms. Step (2) may also linearly increase the size of the matrix. In general, elimination of selectors adds more existential quantifiers of sort term. The newly added quantifiers, however, will be removed in one step together with the original ones. The following example illustrates step (2). Example 37. Step (2), selector elimination, first converts the formula (x) car(x) = y2 cdr(x) = y2 x = y3 (31) into (x1x2) x1 = y2 x2 = y2 cons(x1, x2) = y3 , (32) which, by substitution of x1 for y2, simplifies to (x2) x2 = y2 cons(y2, x2) = y3 . (33) T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1549 Step (3) converts equalities and disequalities of form (30) and equalities of form (29). After step (3) the matrix, if it did not simplify to false, contains only literals of the form x = t( x, y), Ly = t( x, y), Ly = L y . (34) where t is: (i) existentially quantified variables x, (ii) implicitly universally quantified parameters y, (iii) selector terms of parameters in the form Ly (y y), (iv) constants in A, or (v) constructor terms built recursively from terms in (i)­(iv) using non-nullary constructors. Algorithm 9 (Decomposition). (1) Decomposition of Equalities between Constructor Terms. Replace (t1, . . . , tar( )) = (t1, . . . , tar( )) (35) by 1 i ar( ) ti = ti. Repeat until no equality of the form (35) appears. (2) Decomposition of Disequalities between Constructor Terms. Replace (t1, . . . , tar( )) = (t1, . . . , tar( )) (36) by 1 i ar( ) ti = ti. Repeat until no disequality of the form (36) appears. (3) Decomposition of Equalities Containing x. Solve equalities of the form Ly = t( x, y), where t( x, y) is a constructor term containing x, in terms of Ly such that the result is a set of equalities in the selector language. In step (1) of Algorithm 9, recall that literals like x = t( x, y) can always be eliminated together with (x) and hence after this step, we can assume no such literals appear in the matrix. After step (2) we can assume that literals have one of the following forms: Ly = L y , (37) Ly = L y , (38) x = t( x, y), (39) Ly = (t1, . . . , tar( )), (40) Ly = (t1, . . . , tar( )). (41) Step (3) solves equalities of the form (41), and thus we are left only with literals of the forms (37)­(40). Notice that these are the same as those in (34), where Ly = t( x, y) represents both (38) and (40). The following example illustrates how equalities are solved. Example 38. The literal cdr(y) = cons(cons(x1, y1), y2) is converted to the solution set x1 = car(car(cdr(y))), y1 = cdr(car(cdr(y))), y2 = cdr(cdr(y)). 1550 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 Given an input formula ( x) ( x, y), the normalization procedure in Algorithm 8 lets us effectively eliminate all quantifiers ( x) in one step. Indeed, after application of Algorithm 8 we can assume, according to (34), that ( x) ( x, y) is in the form ( x) i xp(i) = ti( x, y) j Ljyq(j) = sj( x, y) k Fkyf(k) = uk( y) l Glyg(l) = Hlyh(l). (42) Here ti, sj are: (i) existentially quantified variables x, (ii) implicitly universally quantified parameters y, (iii) selector terms of parameters in the form Ly (y y), (iv) constants in A, or (v) constructor terms built recursively from terms in (i)­(iv) using constructors, where we require that sj contain at least one occurrence of a variable in x (otherwise, the corre- sponding literal should have been moved out of the scope of ( x)). The term uk( y) can be one of (ii)­(v) above, where in (v) recursion is limited to (ii)­(v). We claim that the first part of (42) ( x) i xp(i) = ti( x, y) j Ljyq(j) = sj( x, y) , (43) is valid, and hence (42) is equivalent to k Fkyf(k) = uk( y) l Glyg(l) = Hlyh(l). (44) Thus the algorithm for elimination of quantifiers can be given as Algorithm 10 (Elimination of quantifiers). Input: ( x) ( x, y). (1) Call Algorithm 8 to normalize ( x) ( x, y). (2) Remove ( x) and all literals containing x. Theorem 39. All transformations in Algorithm 10 preserve equivalence. Theorem 40. Algorithm 10 removes a block of quantifiers in time 2O(n). Example 41. Let us look at an example in List. Consider a type complete formula (x) cons(car(x), y1) = y2 y2 = x . (45) T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1551 (Because of the assumption of type completeness, we know Iscons(y2) and Iscons(x) must be present in the side condition. We omit type information of y1, because it is irrelevant here.) Step (2) of Algorithm 8 removes the selectors on x by converting (45) into (x1x2) cons(x1, y1) = y2 y2 = cons(x1, x2) . (46) Step (3) of Algorithm 8 (Algorithm 9) solves the equality cons(x1, y1) = y2, with solution set { x1 = car(y2), y1 = cdr(y2) }, resulting in (x1x2) car(y2) = x1 cdr(y2) = y1 y2 = cons(x1, x2) , (47) which, by standard substitution and quantifier manipulation, reduces to (x2) y2 = cons(car(y2), x2) cdr(y2) = y1. (48) As (x2)[ y2 = cons(car(y2), x2) ] is in solved form, it is valid, and hence (48) reduces to cdr(y2) = y1, or more formally, to cdr(y2) = y1 Iscons(y2). 7. Decision procedures for quantified theories In this section we present decision procedures for the theory Th(TA ) of term algebras with integers and parameters. In Section 7.1 we first refine the notions and techniques from Section 5 for the construction of a quantifier elimination procedure for Th(TA ), and then, in Section 7.2, pres- ent the quantifier elimination procedure itself. Throughout Sections 7.1 and 7.2 we assume a finite language. In Section 7.3 we further generalize the result to structures whose constant domain has an internal structure and admits quantifier elimination. In Section 7.4 we discuss how the procedure can be adapted for infinite languages. 7.1. Term algebras with integers and parameters In this section we refine the notions and techniques from Section 5 for the construction of a quantifier elimination procedure for Th(TA ), which is given in the next section. We first refine the notion of equality completion. As we have seen, to get an RLCC (LCC) we need to express in Presburger arithmetic the set of legitimate lengths such that a certain number of distinct terms of any length in the set can co-exist. This can be supported by equality completion. Equality completion, however, in general introduces more literals, especially disequalities, which may again destroy the completion because it may cause generation of new terms in the subsequent operation (see disequality splitting in Algorithm 13). To avoid compromising convergence, we introduce the notion of clusters which is weaker than equality completion but contains sufficient information to allow extracting counting constraints. Intuitively, it suffices to have the equality information only between terms of the same length and of the same type. 1552 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 Definition 42 (Clusters). Let be a conjunction of literals in L . Term equality literals ap- pearing in induce equivalence classes on TA-terms occurring in . Let [t] denote such an equivalence class containing the term t. We say that C = { [t0], . . . , [tn] } is an -cluster (of ) if CLSn (t0, . . . , tn) is induced by , i.e., expresses that t0, . . . , tn are pairwise distinct -terms of the same length. The notion of clusters is syntactic modulo = and = (equality substitution). For example, t = s s = u u = v |t| = |v| Is (t, s, u, v), (49) induces the -cluster {[t], [v]} (where [t] = {t, s, u}, [v] = {v}). Formally speaking, a cluster C is induced by the closure of under equality substitution. We chose this definition to limit the form of disequalities generated in transformations of Algorithm 13. Below we will drop the subscript if is clear from the context. For clarity, we view a cluster as a set consisting of terms that are chosen representatives of their corresponding equivalence classes. The choice of representatives is arbitrary unless stated otherwise. For example, { [t0], . . . , [tn] } will also be written simply as { t0, . . . , tn }. A cluster is maximal if no superset of it is a cluster. A cluster C is closed if C is maximal and it is disjoint with any other maximal clusters. Two clusters C and C are called connected if there exists t C and t C (or more formally [t] C and [t ] C ) such that either (i) t = t occurs in the defining formula (i.e., C and C intersect), or (ii) t = t occurs in the defining formula while |t| = |t | does not. Two clusters are called mutually independent if they are not connected. The size of a cluster is the number of equivalence classes it contains. The rank of a cluster C, written rk(C), is |t| for an arbitrarily chosen term t occurring in C. Clusters are partially ordered by their ranks; for two clusters C, C we write C < C if rk(C) = t, rk(C ) = t and |t| < |t | is logically implied by . Example 43. Consider again the formula from Example 32, |y| = |cons(x, z)| |x| = |z| |y| = |x| Iscons(x, y, z) t,t S;tt t = t , (50) where S = {x, y, z, cons(x, z)}. This formula induces two mutually independent cons-clusters, C1 : { x, z }, C2 : { y, cons(x, z) } with rk(C1) < rk(C2). A conjunction of literals is cluster complete if all maximal clusters of are mutually indepen- dent. A conjunction of literals is a cluster completion of if and is cluster complete. is compatible with if satisfiability of implies satisfiability of . Like equality completion, we are only interested in compatible cluster completions. T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1553 Example 44. The List formula : x = y x = z x = w |x| = |y| |x| = |z| Iscons(x, y, z, w) (51) gives three maximal cons-clusters C1 : { x, y }, C2 : { x, z }, C3 : { w }. C3 is closed, but neither C1 nor C2 is. If we conjoin (51) with y = z, C1 and C2 are merged, making the resulting cluster { x, y, z } maximal. If we conjoin (51) with y = z, then C1 and C2 become identical. Neither strengthening, however, results in a cluster completion of (51), because C3 is unaffected by either changes, and in both cases is still connected to other maximal clusters. The first case can be made cluster complete by conjoining it with |w| = |x| or |w| = |x| w = y w = z. Similarly for the second case. As demonstrated in Example 43, equality completeness implies cluster completeness. The con- verse, however, does not hold: it is not necessary to have complete equality and type information on all terms to induce a set of mutually independent clusters. For example, n i=1 |xi| = |yi| xi = yi Is (xi, yi) , is not equality complete, but induces n maximal and mutually independent -clusters, namely { {xi, yi} | 1 i n }. This in fact is what we want: a constraint weaker than an equality and type completion. We also need to refine the notion of RLCC to deal with parameters. Definition 45 (RLCC with parameters). Consider ( x : ) ( x, y)  ( x, y) , where y are parameters. Let (2) ( y) be the maximum subset of ( x, y) not containing x and (1) ( x, y) = ( x, y) \ (2) ( y). A formula ( x, y) is an RLCC in x for ( x, y) relativized to  ( x, y), (in short, ( x, y) is an RLCC for ( x, y)/ x/ ( x, y)), if the following formulas are valid: ( x, y : ) ( x, y)  ( x, y) (z : ) (z, y) | x| = z , (52) ( y : )(z : ) (2) ( y) (z, y)( x : ) ( x, y)  ( x, y) | x| = z . (53) In the viewpoint of logical equivalence, we can assume (2) ( y) is empty as it can be moved out of the scope of ( x). But we cannot make such an assumption with respect to the computation of RLCC, as its existence in general affects cluster completeness. For example, (x : ) x = y1 x = y2 Is (x, y1, y2) |x| = |y1| |x| = |y2| is not cluster complete without equality information between y1 and y2. 1554 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 For the construction of an RLCC, we require that ( x, y)  ( x, y) be cluster complete and in strongly solved form. Definition 46 (Strongly solved form). We say that ( x, y)  ( x, y) is strongly solved in x if ( x, y) is solved in x and all literals of the form Ly = t( x, y), where y y and t( x, y) is a constructor term (properly) containing x, are redundant. We say that ( x : )[ ( x, y)  ( x, y)] is in strongly solved form if ( x, y)  ( x, y) is strongly solved in x. Example 47. The formula list  Iscons(y)y = cons(x, z) t,t S;tt t = t |y| = |cons(x, z)| |x| = |z| |y| = |x|, (54) with S = {x, y, z, cons(x, z)} is not in strongly solved form. It can be made into strongly solved form, however, by adding car(y) = x or cdr(y) = z to list, or by changing |cons(x, z)| = |y| to |cons(x, z)| = |y| in  . Either one will make the literal y = cons(x, z) redundant. Recall that CLSn(x0, x1, . . . , xn) states that x0, . . . , xn is a -cluster of n + 1 elements. We claim that Algorithm 6 computes an RLCC (with parameters y being treated as ordinary variables). Theorem 48. If ( x, y)  ( x, y) is strongly solved in x and cluster complete, then ( x, y) computed by Algorithm 6 is an RLCC for ( x, y)/ x/ ( x, y). 7.2. A quantifier elimination procedure for Th(TA ) In this section we expand Algorithm 8 to an elimination procedure for Th(TA ), the first-order theory of term algebras with integers. Since L has two sorts, namely and , we need to show elimination of integer quantifiers as well as term quantifiers. 7.2.1. Elimination of quantifiers on integer variables We assume that formulas with quantifiers on integer variables are in the form (z : ) ( x) ( x, y, z) , (55) where y, z are integer variables and x are term variables. Since ( x) is in L , we can move ( x) out of the scope of z, obtaining ( x) (z : ) ( x, y, z). (56) For this reason, neither ( x) nor ( x, y, z) is required to be quantifier-free. Now we can put (z : ) ( x, y, z) into a quantifier-free form using Cooper's method [7,28]. For the sake of effi- ciency, however, we can defer the actual elimination on (z : ) ( x, y, z) until all term quantifiers have been eliminated. The reason, as we shall see soon, is that the elimination of term quantifiers does not require the integer constraint in (57) to be quantifier-free. T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1555 7.2.2. Elimination of quantifiers on term variables We assume that formulas with quantifiers on term variables are in the form ( x : ) ( x, y) ( x, y, z) , (57) where x, y are term variables, z are integer variables, ( x, y) is quantifier-free. The following algorithm is based on Algorithm 8. Algorithm 11 (Normalization with parameters). Input: (57). Apply the following subprocedures suc- cessively. (1) Basic Normalization. Apply Algorithm 8 to ( x, y) and update ( x, y, z) accordingly. (2) Cluster Completion. Normalize (57) to a cluster completion (Algorithm 12). (3) Decomposition of Disequalities. Decompose disequalities of the Form Ly = t( x, y) (Algo- rithm 13). The purpose of steps (1­3) is to transform (57) into a formula which is in strongly solved form and cluster complete, such that we can use Algorithm 6 to construct an RLCC. Having the RLCC allows us to reduce term quantifiers to integer quantifiers. Step (1) gives us a formula of the form (omitting integer literals) (43) (copied below) ( x : ) i xp(i) = ti( x, y) j Ljyq(j) = sj( x, y) . (43) Algorithm 12 produces cluster completions of the input by merging mutually dependent maximal clusters. Algorithm 12 (Cluster completion). Input: (43). Apply the following subprocedures repeatedly until cluster completeness is obtained. (1) Select Clusters. Let C1 = {t1, . . . , tn}, C2 = {s1, . . . , sm} be two connected maximal clusters with witness either ti = sj or ti = sj. (2) Merge Clusters. (a) If ti = sj occurs. Guess an equality completion for C1 C2. (b) If ti = sj occurs, but not |ti| = |sj|. Split on |ti| = |sj| |ti| = |sj|. For the |ti| = |sj| branch, guess an equality completion for C1 C2. (3) Renormalization. Apply Algorithm 9 to ( x, y) and update ( x, y, z) accordingly. 1556 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 The termination of Algorithm 12 can be seen as follows. Step (2) may increase the number of literals and hence the number of term occurrences, but it does not increase the number of equiva- lence classes on TA-terms. Step (3) (Algorithm 9) is called afterwards to restore the normal form. Clearly, steps (1­2) of Algorithm 9 do not introduce new terms. Step (3) of Algorithm 9 solves equalities containing existentially quantified variables x in terms of parameters y. However, it will only introduce new terms into existing equivalence classes. In particular, it removes at least one variable in x by substitution. So there can be at most n rounds of the run of step (2) where n is the number of distinct terms in (43). It is easily seen that the resulting formula is still in the form of (43) (omitting integer literals). Algorithm 13 decomposes disequalities in that are of the form Ly = t( x, y) such that |Ly| = |t( x, y)| is implied by and t( x, y) is a constructor term (properly) containing x. The decomposi- tion consists of a sequence of disjunctive splittings, where in each step the matrix of (57) is updated accordingly. It is assumed that when Algorithm 13 is called, the matrix of (57) is cluster complete. The algorithm preserves this completeness: all resulting branches are cluster complete. Algorithm 13 (Decomposition of disequalities containing x). Input: Set D of disequalities of the form Ly = t( x, y) such that |Ly| = |t( x, y)|, in the context of (57). Repeat until D is empty. (1) Disequality Removal. Select D D such that for some Ly D = { Ly = (t (i) 1 ( x, y), . . . , t (i) k ( x, y)) | 1 i m } (a) Disequality Splitting. Remove D from D and add to ( x, y) m i=1 sp(i)Ly = t (i) p(i)( x, y) 1 j 3 2 |x2|. 1560 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 By step (2) of Algorithm 14, (66) is equivalent to cdr(y2) = y1 (x2 : list) cdr(y2) = x2 (67) |y2| = |car(y2)| + |x2| + 1 |cdr(y2)| = |x2| |x2| > 3 2 |x2| (|car(y2)| + |x2| + 1) , which, by the reduction of term quantifiers to integer quantifiers, transforms to cdr(y2) = y1 (z : ) |y2| = |car(y2)| + z + 1 |cdr(y2)| = z (68) z > 3 2 z (|car(y2)| + z + 1) . Bysubstituting|cdr(y2)|forz wecanremove(z: ).Usingthefactthat|y2| = |car(y2)| + |cdr(y2)| + 1, we finally obtain cdr(y2) = y1 |cdr(y2)| > 3 2 |cdr(y2)| (|car(y2)| + |cdr(y2)| + 1). (69) It is easy to verify that (69) implies (62). The reverse direction, however, does not hold because we only showed one branch of the reduction. 7.3. Richer theories on constant domain Similar to Section 5.5, we can have quantifier elimination for TA+ provided Ac also admits quantifier elimination. It is easily seen that no change is needed for the elimination of quantifiers on integer variables. For the elimination of term quantifiers, we assume formulas are of the form ( x : ) c( x, y) ( x, y) ( x, y, z) , (70) where c( x, y) is a formula in Lc. Thanks to step (2) in Algorithm 8, we can assume that none of x appears inside selectors. Without loss of generality, we also assume x are either all asserted to be non-nullary constructor terms or all asserted to be constants. In the former case, we can assume c( x, y) is just c( y) because x do not properly appear inside selectors and hence they do not have any place in c. So we can simply move c( y) out of x, obtaining a quantified formula in the same form as (57). In the latter case, we can assume that x do not properly occur inside constructor terms either, because we can decompose constructor terms properly containing x in the same way as in Algorithms 9 and 13. So literals containing x in ( x, y) are equalities or disequalities on terms of constant type, and hence we can rewrite ( x, y) as (a) ( x, y) (b) ( y) with (a) ( x, y) being a constraint in Lc. We can then assume that (a) ( x, y) is a part of c( x, y). We also simplify ( x, y, z) to ( y, z) by instantiating all |xi| to 1. Now it is clear that (70) is equivalent to (b) ( y) ( y, z) ( x : ) c( x, y), (71) where the quantifiers can be handled by the quantifier elimination procedure for Th(Ac). T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1561 7.4. Adaptation for infinite languages In this section we describe adaptations to aforementioned algorithms needed to deal with the- ories in a (countably) infinite language. We require the number of distinct arities of the language be finite, which is not a real restriction for practically interesting theories. We distinguish three types of infinite languages according to the cardinalities of the constant domain and non-nullary constructor domain: (1) infinitely many constants, finitely many non-nullary constructors; (2) finitely many constants, infinitely many non-nullary constructors; (3) infinitely many constants, infinitely many non-nullary constructors. A finite language is called type 0. Below we discuss the three aspects of our algorithms that are affected by an infinite signature: type completion, counting constraints, and constant instantiation. 7.4.1. Type completion Type completion is affected only by the cardinality of the domain of non-nullary constructors, since for type completion the identity of constants is not important; only the fact that a term is a constant is relevant, which is provided by IsA. Thus, no adaptation is necessary for languages of type 1. For languages of type 2 and 3, note that a given formula can only contain finitely many non-nullary constructors. For type completion we can collect all non-nullary constructors not occurring in into one pseudo-constructor type U, thus reducing a type 2 (respectively, type 3) language a type 0 (respectively, type 1) language. Below, we show how to define the counting constraints for the new constructor type U. 7.4.2. Counting constraints Both an infinite constant domain and an infinite number of non-nullary constructors allow counting constraints to be relaxed. For languages with an infinite constant domain (type 1 and 3), CNT,n(x) (respectively, CNT,n(x)) is equivalent to Tree (x) (respectively, Tree(x)), as there are infinitely many trees of any legitimate tree length. For languages with only an infinite number of non-nullary constructors (type 2), the situa- tion is slightly more complicated, because, depending on the arities of those non-nullary construc- tors, an infinite subset of those non-nullary constructors may or may not be useful in forming terms of a particular length. First consider the simplest case in which the language has an in- finite number of non-nullary constructors of arity i only. Then the number of distinct terms of length x is infinite if and only if x - i is a legitimate tree length (and thus x is also a legitimate tree length). If x - i is not a legitimate tree length, the counting constraint reduces to that of a finite language. Thus in this case the counting constraint CNTk,n,i(x), where the additional sub- script i denotes the arity with an infinite number of non-nullary constructors, can be defined as CNTk,n,i(x) : Tree(x - i) CNTk,n(x). 1562 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 In general there can be a finite number of arities for which the number of non-nullary constructors is infinite. Let I be the set of these arities. Then the counting constraint CNTk,n,I (x) can be expressed as CNTk,n,I (x) : iI Tree(x - i) CNTk,n(x), and the counting constraint CNTk,n,I (x) can be expressed as CNTk,n,I (x) : iI Tree (x - i) CNTk,n(x). Note that CNTk,n(x) and CNTk,n(x) above are defined with respect to a finite sub-language con- taining all constants and all non-nullary constructors not having arities in I. It does not matter if the sub-language contains any non-nullary constructors having arities in I, because the pres- ence of those constructors does not affect the truth value of CNTk,n(x) (respectively, CNTk,n(x)) when iI Tree(x - i) (respectively, iI Tree (x - i)) is false. Let S denote the signature of such a sub-language. For the pseudo-constructor type U introduced above we have CNTU k,n,I (x) : iI Tree(x - i) S CNTk,n(x), since U must necessarily include all non-nullary constructors of arities in I. 7.4.3. Constant instantiation Having an infinite number of constants prohibits the application of the constant elimination (step (4) of Algorithm 8). In that step all variables x that are asserted to be constants are nondeter- ministically instantiated with constants, thereby allowing the elimination of the corresponding x. With an infinite number of constants, direct instantiation obviously is not possible. These variables, however, can still be eliminated as follows. At step (4) of Algorithm 8 all existentially quantified variables appear in disequalities only (cf. (34)). Let xc x be the set of existentially quantified vari- ables that are asserted to be constants. Then for any given assignment of the parameters y and the remaining variables x\ xc, all disequalities containing variables in xc can be simultaneously satisfied by assigning them distinct constants occurring neither in [[ y]] nor in the formula. Therefore all variables in xc can be removed from the formula (43), as desired. 8. Conclusion We presented decision procedures for term algebras augmented with Presburger arithmetic, for quantifier-free theories and quantified theories. Our technique is based on the extraction of exact integer constraints from term constraints, and in case of quantified theories, combined with the reduction of term quantifiers to integer quantifiers. We have extended our results to queues, a type of non-recursive data structure in which an ob- ject can be constructed in more than one way [41]. We plan to extend this work to reason about T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1563 the combination of data structures with integers in richer languages such as the theory of term algebras with subterm relation [35], the theory of concatenation [22] and the theory of queues with sub-sequence relations (including subqueue, prefix or suffix relation) [4]. We also plan to extend our results to allow more than one integer function on data structures, Such extensions allow us to model a wide range of augmented data structures, such as balanced tree structures like AVL trees and red-black trees [8]. For example, the theory of term algebras with two length functions, which respectively give the length of the maximum path and the length of the minimum path, can express red-black trees [8]. 9. Acknowledgments We thank the anonymous referees for their careful reading and suggestions. The detailed com- ments that pointed out several inaccuracies in an earlier version were extremely valuable. Appendix A In the following proofs we need to express legitimate lengths of trees. To support those ex- pressions we use the following lemma, which is a well-known result following from the Euclidean algorithm for computing greatest common divisors. Lemma 54. Let d1, . . . , dn be positive integers and gcd their greatest common divisor. Then if x1, . . . , xn 0 x = n i=1 xidi then gcd | x. (A.1) In addition, there exists Nd such that for any x Nd the reverse also holds, that is, if gcd | x then x1, . . . , xn 0 x = n i=1 xidi . (A.2) For 1 x < Nd , however, there is no closed formula to decide if x is a non-negative linear combi- nation of d1, . . . , dn. Finding the smallest Nd for gcd = 1 is known as the Frobenius Coin Problem, and has been shown to be NP-hard. Below we present the proofs not included in the main text. For ease of reference we restate the propositions and theorems. Proposition 24. obtained by Algorithm 4 is expressible in a quantifier-free Presburger formula linear in the size of . Proof. Let n be the size of . Then the size of G is O(n). For each node in G we add (on average) at most four integer constraints. For an equivalence class {t1, . . . , tn}, we only need to add n - 1 equalities, namely |t1| = |t2| = = |tn|. So it suffices to show that the integer constraints Tree(|t|) and Tree (|t|) can be expressed in quantifier-free Presburger formulas linear in the size of . 1564 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 Let d1, . . . , dn be the distinct arities of the non-nullary constructors in the given language. Since n and di (1 i n) are constant values (in the given language), the predicates Tree(x) and Tree (x) are of constant length (with quantifiers). By Cooper's method, there exist equivalent quantifier-free formulas with length at most triple exponential in the length of Tree(x) (or Tree (x)) [27]. Of course, these equivalent quantifier-free formulas are still of constant length. In fact Tree(x) can be expressed directly by a quantifier-free formula. Indeed, Tree(x) states that x - 1 is a non-negative linear combination of d1, . . . , dn. By Lemma 54 this condition is equivalent to gcd | x - 1 provided x - 1 Nd where gcd and Nd are as stated in the lemma. Therefore Tree(x) is equivalent to sS (x - 1 = s) x - 1 Nd gcd | x - 1 . where S {1, . . . , Nd - 1} can be precomputed for the given language to contain exactly those values 1 s < Nd such that x1, . . . , xn 0 s = n i=1 xidi . Theorem 25. obtained by Algorithm 4 is an LCC for . Proof. To show that computed by Algorithm 4 is an LCC for , we need to show, by Definition 17, the validity of ( x : ) ( x) (z : ) (z) | x| = z , (8) (z : ) (z) ( x : ) ( x) | x| = z . (9) Clearly, from the description of Algorithm 4, for any variable assignment satisfying , | | satisfies and thus (8) holds. To demonstrate the validity of (9), let be a satisfying assignment of . We have to show that there exists a variable assignment such that | | = and satisfies . Let be an arbitrary variable assignment such that | | = . Clearly such an assignment exists; it may not, however, satisfy . We show how can be transformed into an assignment that is guaranteed to satisfy . Let G be the DAG of and R be the bidirectional closure induced by . Let G be the extension of G that represents the variable assignment , that is G is obtained from G by replacing each leaf labeled by a variable v by the ground tree representing (v). Without loss of generality we assume that all leaf vertexes in G are labeled by either constants or variables; this can be achieved by variable abstraction of selector terms, as illustrated in Example 55 below. To obtain from , apply the following two steps: (1) substitute each variable asserted to be a constant in by a fresh constant. This is possible, since TA has infinitely many constants. (2) for each equivalence class {v1, . . . , vk} set (vk) = = (v2) = (v1). We claim that is a satisfying assignment for . Clearly | | = | | = since the transformation does not affect the lengths of the terms. Moreover, respects R , that is, for any terms t and s, T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1565 Fig. A. 1. The DAG of in Example 55. (t, s) R implies (t) = (s) (or Gt = Gs). It remains to show that for any two nonequivalent vertexes s and t in G , Gs = Gt. Let s and t be two vertexes such that s = t in , but suppose, for a proof by contradiction, that Gs = Gt. Let ht(s) denote the height of Gs and similar for ht(t). Without loss of generality, as- sume that among all pairs of vertices t1, t2 such that t1 = t2 in , but Gt1 = Gt2 , h = ht(t1) = ht(t2) is minimal. If h = 1, then both s and t are variables in distinct equivalence classes, and thus they were assigned distinct constants in step (1), a contradiction. If h > 1, then (t) = (s) implies that (t[1]) = (s[1]). But ht(t[1]) = ht(s[1]) = h - 1 < h, contradicting the minimality of h. Example 55. To illustrate the variable abstraction and the construction of a satisfying variable assignment consider the following constraint : Iscons(x) cons(y, z) = cons(cdr(x), z) car(x) = z. (A.3) The DAG of , shown in Fig. A.1, is the same as that of Example 18 and as in that example, R is { (v1, v2), (v3, v4), (v6, v7) }. To eliminate the selector terms labeling the leaf nodes v5 and v6, we introduce two new variables, x1 and x2, and let x1 = car(x) and x2 = cdr(x). Now x can be replaced by cons(x1, x2) and thus is not part of the variable assignment. A satisfying integer assignment for is : { |x1| := 1, |x2| := 5, |y| := 5, |z| := 1 }. A corresponding term assignment such that | | = is : { x1 := a, x2 := cons(cons(a, a), a), y := cons(a, cons(a, a)), z := a }. The first step of the transformation produces { x1 := a1, x2 := cons(cons(a2, b2), c2), y := cons(a3, cons(b3, c3)), z := a4 }. Since x2 and y are in the same equivalence class, the second step of the transformation produces :{ x1 := a1, x2 := cons(cons(a2, b2), c2), y := cons(cons(a2, b2), c2), z := a4 }. which satisfies . 1566 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 Proposition 30. CNTk,n(x) and CNTk,n(x) are expressible by quantifier-free Presburger formulas that can be computed in time O(n). Proof. Let the language L have m non-nullary constructors 1, . . . , m with arities d1, . . . , dm. Let N(x) denote the number of distinct term trees of length x. Recall that CNTk,n(x) holds if and only if N(x) > n for a constant domain of size k. Thus it suffices to express N(x) > n as quantifier-free Presburger formula of size O(n). For x > 1, N(x) can be expressed as a recurrence relation: N(x) = m i=1 x1, . . . , xdi | x1 + + xdi = x - 1 x1, . . . , xdi > 0 di j=1 N(xj). (A.4) The relation can be explained as follows: there are m ways to label the root of a tree; for a root with d children with lengths x1, . . . , xd , respectively, there are d j=1 N(xj) combinations. Using dynamic programming we can compute N(1), N(2), . . ., with N(1) = k, until we reach the first xmin such that N(xmin) > n. We first consider the special case that d1 = = dm, where we have for any x1, x2 > 0 the following monotonicity property: Tree(x1) Tree(x2) x1 > x2 N(x1) > N(x2). (A.5) The reason is that in this case term trees always "grow continuously", that is, the next larger tree is always obtained by expanding one of the vertexes of the previous tree. For this special case N(x) > n reduces to Tree(x) x xmin. To show that Tree(x) x xmin can be computed in O(n) time, let d be the maximum arity. There are O(xd-1) different sequences of positive numbers with sum x - 1, and thus N(x) can be obtained by O(xd ) arithmetic operations. As N(x) grows exponentially in x, xmin is at the scale of O(lg n). Moreover, as all integers in the computation are less than n, any arithmetic operation costs time O(lg n). Therefore the search for such xmin can be done in O(n) time. Unfortunately, (A.5) does not hold in general when arities are different. For example, for d1 = 3, d2 = 10 and k = 1, N(10) = 12, while N(11) = 1. The problem is that in this case term trees do not necessarily grow continuously. Indeed, a tree of length 10 must consist of a root with three children, with lengths either 4, 4, and 1, or 7, 1, and 1. A tree of length 11, on the other hand, can only consist of a root with 10 children each of size 1. Consequently, a tree of length 10 cannot "grow" into a tree of length 11, and therefore N(10) and N(11) are completely unrelated. However, there exists Nd such that for any x xmin + Nd , N(x) > N(xmin) iff Tree(x). Let Nd be as in Lemma 54 and let gcd be the greatest common divisor of d1, . . . , dm. The "only if" direction is trivial. For the "if" direction, suppose Tree(x) holds, that is, x - 1 can be expressed as a non-negative linear combination of d1, . . . , dm. By Lemma 54, (A.1), gcd | x - 1. For the same reason, we have gcd | xmin - 1, and since x - xmin = (x - 1) - (xmin - 1) we have gcd | x - xmin. T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1567 By assumption, x - xmin Nd , and therefore, by Lemma 54, (A.2), x - xmin can be expressed as a non-negative linear combination of d1, . . . , dm, and thus a tree of length xmin can grow into a tree of length x by replacing one of its vertexes with a tree of size x - xmin + 1. Therefore, N(x) > n, and thus CNTk,n(x) can be expressed by sSn (x = s) x xmin + Nd Tree(x) , where Sn = {xmin s < xmin + Nd | N(s) > n}. Since Nd and the size of Sn are constant, CNTk,n(x) can be computed in O(n) time as desired. The proof for CNTk,n(x) is similar. Proposition 33. obtained by Algorithm 6 is expressible in a quantifier-free Presburger formula of size linear in the size of  . Proof. By Proposition 24, the call to Algorithm 4 to obtain takes time O(n) where n is the size of the input  . Next, for each -cluster of size m, Algorithm 6 adds CNTk,m(x) to . By Proposition 30, CNTk,m(x) can be computed in time O(m). Since the sum of sizes of all clusters is O(n), it follows that can be computed in O(n), and hence the size of is O(n). Theorem 34. obtained by Algorithm 6 is an RLCC for / . Proof. Let ( x) be a type- and equality-complete term constraint and  (| x|) be a Presburger formula. To show that computed by Algorithm 6 is an RLCC for / , we need to show, by Definition 20, the validity of ( x : ) ( x)  ( x) (z : ) (z) | x| = z , (11) (z : ) (z) ( x : ) ( x)  ( x) | x| = z . (12) For (11) consider an arbitrary variable assignment such that  is true. By Algorithm 6, consists of ,4  (CNTk,n)i, where ,4 is the constraint computed by Algorithm 4 and (CNTk,n)i are the counting constraints added in step (3). By Theorem 25, there exists an integer assignment such that = | | such that ,4 is true, and obviously this assignment also makes  true. Finally, the counting constraints impose a restriction on the length of terms. By Algo- rithm 6, for any term t such that CLSn(t, t1, . . . , tn) is implied by  , includes the counting constraint CNTk,n(t, t1, . . . , tn), or equivalently, N(|t|) > n, with N(|t|) as before. Since satisfies  , it must have assigned different terms to t, t1, . . . , tn, and thus their length necessarily satisfies N(|t|) > n. For (12) consider an arbitrary integer assignment , assigning z := d, such that ( d) holds. We have to show that there exists a term assignment , assigning x := t such that |t| = d and (t) and  (t) hold. In contrast with the proof of Theorem 25, it is not immediately obvious that such an assignment exists, because we no longer assume an infinite constant domain. Therefore we incrementally construct , starting with terms of smallest length. 1568 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 Let G be the DAG and R the bidirectional closure for as constructed in Algorithm 1. We assume, as in the proof of Theorem 25, that all selectors have been eliminated and thus leaf nodes of G are labeled by either constants or variables. Let G be identical to G except that all nodes corresponding to equivalence classes in R have been merged. Since is equality-complete any two distinct vertices in G must correspond to distinct terms. To construct , we order all term lengths according to the values assigned by : |t (1) 0 | = = |t(1) n1 | block 1 < |t (2) 0 | = = |t(2) n2 | block 2 < < |t (k) 0 | = = |t(k) nk | block k . Note that any term t in appears in and hence |t| is assigned a length by and appears in the above sequence. Let li be the length of the terms in block i. For simplicity, we assume all terms in block i are of type i. Starting with the terms in the first block, that is, t (1) 1 , . . . , t (1) n1 , we incrementally construct a satisfying assignment for such that | | = . Obviously t (1) i (0 i n1) is either a constant or a term variable as its length is the smallest, and we only need to consider t (1) i which are term variables. By Algorithm 7 we know that CNT 1 k,n1 (l1) is in . As satisfies , there are at least n1+1 different 1-terms of length l1. Therefore we can simply assign each t (1) i (if it is a variable) a distinct term. Now we proceed to the (i + 1)th block assuming that the terms in the ith block have been assigned. At this time the values of all non-variable terms in the (i + 1)th block have been fixed because variables (if any) in those terms have length less than li and those variables have been assigned by the ith round. By the same argument as before, due to the presence of CNT i+1 k,ni+1 (li+1) in , we are able to assign each variable in the (i + 1)th block a different tree of length li+1. The assignment in each round will not create any equality between terms in G simply because terms of the same type and the same length are assigned to different values. By induction we can eventually construct a satisfying assignment for such that | | = . Since includes  ,  ( d) holds, and therefore also satisfies  , as required. Theorem 39. All transformations in Algorithm 10 preserve equivalence. Proof. Recall that Algorithm 10 first transforms a formula into one in solved form, that is in the form ( x) i xp(i) = ti( x, y) j Ljyq(j) = sj( x, y) (42) k Fkyf(k) = uk( y) l Glyg(l) = Hlyh(l). using Algorithm 8. It then eliminates the quantifiers by removing all literals containing x. It is straightforward to show that all transformations in Algorithm 8 preserve equivalence, and we omit the proof. It remains to show that (42) is equivalent to k Fkyf(k) = uk( y) l Glyg(l) = Hlyh(l), T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1569 that is, that ( y : )( x : ) i xp(i) = ti( x, y) j Ljyq(j) = sj( x, y) . (43) is valid, or that i xp(i) = ti( x, b) j Ljbq(j) = sj( x, b), (A.6) is satisfiable for an arbitrary sequence b of fixed TA-terms. For simplicity, we assume that all non- nullary constructors have the same arity. The general case can be proved via a minor modification. Let x be x1, . . . , xn. Let be the maximal length of terms not containing x in (A.6) and let > be such that there exist at least n distinct nonconstant terms d = d1, . . . , dn of length . We claim that d satisfies (A.6). Observe that, due to step (4) of Algorithm 8, none of the xi are constrained to be constants, and thus all variables can be assigned nonconstant terms. To show that Ljbq(j) = sj( d, b) holds, we assume that the constructor term sj actually contains x. If not, the literal Ljyq(j) = sj( x, y) can be moved out of the scope of x. Since sj is a constructor term, |sj( d, b)| > . On the other hand, Lj is a selector sequence and thus |Ljbq(j)| . Therefore, for all j, Ljbq(j) = sj( d, b). To show that dp(i) = ti( d, b), consider the following four cases depending on the structure of ti. (1) ti does not contain any variable in x. Then |ti(b)| while |dp(i)| = > , and hence dp(i) = ti(b). (2) ti properly contains a variable in x. Then |dp(i)| = but |ti( d, b)| > , and hence dp(i) = ti(b). (3) ti is a constant ci. Then dp(i) = ci holds since none of terms in d is a constant. (4) ti is a variable in x. Then ti xp(i ) where p(i ) = p(i) (or xp(i) = xp(i) simplifies to false). Since p(i) = p(i ), dp(i) = dp(i ) holds by the selection of d. Theorem 40. Algorithm 10 removes a block of quantifiers in time 2O(n). Proof. First note that we need not be concerned with the increase of the matrix size by the sub- stitution, since we can represent a conjunction of literals efficiently using the DAG representation (Definition 8), in which substitution can simply be done by rearranging edges in the graph. For example, consider the following sequence of formulas x1 = (x2, x2), x2 = (x3, x3), . . . , xn = (xm+1, xm+1). Instead of generating a formula of size 2O(m), the substitution only gives a linear "double-edged" path from x1 to xm+1. For details see [26]. It suffices to analyze each step of Algorithm 8. Let n be the size of the matrix. Step (1) (type completion) generates at most 2O(n) disjuncts, as for a formula containing n selector terms, there are at most 2O(n) combinations of tester literals. Step (2) (selector elimination) can be done in O(n) 1570 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 as a selector term can be transformed to a formula in the constructor language in linear size. Step (3) (decomposition, Algorithm 9) generates at most 2O(n) disjuncts. In addition, it can increase the matrix size to O(n2), due to solving x in terms of y (step (3) of Algorithm 9); the total lengths of all paths in a tree of size n is bound by n2. Note that the size increase comes from the increase of term occurrences and the number of distinct terms is still in O(n). Step (4) (constant elimination) produces at most 2O(n) disjuncts. All put together, steps (1­4) generate 2O(n) disjuncts each of which has size O(n2) and contains O(n) distinct terms. Theorem 48. If ( x, y)  ( x, y) is strongly solved in x and cluster complete, then ( x, y) computed by Algorithm 6 is an RLCC for ( x, y)/ x/ ( x, y). Proof. To show that computed by Algorithm 6 is an RLCC for ( x, y)/ x/ ( x, y), we need to show, by Definition 45, the validity of ( x, y : ) ( x, y)  ( x, y) (z : ) (z, y) | x| = z , (52) ( y : )(z : ) (2) ( y) (z, y) ( x : ) ( x, y)  ( x, y) | x| = z . (53) The validity of (52) can be shown by a similar argument as was given for (11) in the proof of Theorem 34. To prove (53) consider arbitrary assignments and (y) , assigning z := p and y := s respec- tively, such that ( p, s) and (2) (s) hold. We have to show that there exists a term assignment (x) , assigning x := t such that |t| = | p| and (t, s) and  (t, s) hold. As in the proof of Theorem 34, we incrementally construct (x) , starting with terms of smallest length. To construct (x) , we order all term lengths according to the values assigned by : |u (1) 0 | = = |u(1) n1 | block 1 < |u (2) 0 | = = |u(2) n2 | block 2 < < |u (j) 0 | = = |u (j) nj | block j . Let li be the length of terms in the ith block. For any TA-term t occurring in ( x, y), |t| ap- pears in ( x, y) and hence in the above sequence. In general a block can contain more than one cluster for each type. However, by Proposition 50, maximal clusters are mutually independent, therefore, without loss of generality, we can assume each block i consists of only one maximal i-cluster. Beginning with terms in the first block, namely u (1) 1 , . . . , u (1) n1 , we incrementally construct a sat- isfying assignment for  ( x, s). We only need to consider u (j) nj which contains x. Obviously u (1) i (0 i n1) is either a constant or a variable in x as its length is the smallest. By Algorithm 6 we know that CNT 1 k,n1 (l1) is in ( x, y). As p satisfies ( x, s), there are at least n1 + 1 differ- ent terms of length l1. Therefore we can simply assign each u (1) i (if it is a variable) a distinct 1- term. Now suppose that all variables in the ith block have been assigned. Consider the (i+1)th block. At this time values of all non-variable terms in the (i + 1)th block have been determined, because x only appear inside constructor terms, and hence have been assigned values by the ith round. For ex- ample, suppose that t(x) is a constructor term in the (i+1)th block. Since |x| < |t(x)|, x was assigned by the ith round and so is the value of t(x). Due to the presence of CNT i+1 k,ni+1 (li+1) in ( x, y), we T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1571 are able to assign each variable in the (i + 1)th block a different term of length li+1. The variable assignment in each round does not create any equality between terms in a block, i.e., it does not violate any disequalities. Up to now the proof is essentially the same as the one for Theorem 34. The only problematic case is that a cluster may have selector terms containing y as well as constructor terms containing x. For example, suppose that a cluster C in the ith block contains both Ly and i(t1( x, y), . . . , tk( x, y)) for ar( i) = k. [[Ly]] is fixed by (y) and [[t1( x, y)]], . . . , [[tk( x, y)]] have been determined before the ith round and hence [[ i(t1( x, y), . . . , tk( x, y))]] is determined too. By Definition 46, however, the disequality Ly = t( x, y) is redundant, which can only be the case if for some j (1 j k), s i j Ly = tj( x, y) is implied by . Since |tj( x, y)| < | i(tj( x, y), . . . , tk( x, y))|, tj( x, y) has been as- signed before the ith round such that [[s i j Ly]] = [[tj( x, y)]]. As a consequence we have [[Ly]] = [[ i(t1( x, y), . . . , tk( x, y))]]. Since at each step we can build a satisfying partial assignment for x, by induction, we can even- tually construct a satisfying assignment t such that (t, s) and |t| = p. It is clear that  (t, s) also holds as implies  (thanks to step (2) of Algorithm 6). Proposition 50. Algorithm 11 produces a formula which is in strongly solved form and cluster complete. Proof. The production of a strongly solved form is guaranteed by Algorithm 13. It suffices to show that Algorithm 13 preserves cluster completeness. Step (1a) of Algorithm 13 may generate new equalities as well as new disequality literals of the form Ly = t ( x, y), which in general destroys the mutual independence of clusters. Step (2) (Algorithm 12), however, is called after each run of step (1a) to restore cluster completeness. The termination argument is given in the description of the algorithm. Theorem 51. All transformations in Algorithm 14 preserve equivalence. Proof. Clearly, Algorithm 11 preserves equivalence. So, it suffices to show the equivalence between ( x : ) (1) ( x, y) (2) ( y)  ( x, y) ( x, y, z) (59) and (2) ( y) ( u : ) ( u, y) ( u, y, z) . (61) By Proposition 50 ( x, y)  ( x, y) is in strongly solved form and cluster complete. It then fol- lows from Theorem 48 that ( x, y) is an RLCC for ( x, y)/ x/ ( x, y). By Proposition 22, ( x, y) ( x, y, z) is an RLCC for ( x, y)/ x/( ( x, y) ( x, y, z)). Toshowthat(59)implies(61),assumethatforall y andforall z thereexists x suchthat (1) ( x, y) (2) ( y)  ( x, y) ( x, y, z) holds. Then by Definition 45, (52), there exists u = | x| such that ( u, y) ( u, y, z) holds, and hence (61) holds. To show that (61) implies (59), assume that for all y and for all z there exists u such that (2) ( y) ( u, y) ( u, y, z) holds. Then by Definition 45, (53), there exists x such that | x| = u and ( x, y)  ( x, y) ( x, y, z) holds, and hence, noting that ( x, y) (1) ( x, y) (2) ( y), (59) holds, as required. 1572 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 Theorem 52. Algorithm 14 eliminates a block of quantifiers in time 2O(n2 lg n). Proof. We only need to analyze each step of Algorithm 11. As shown in the proof of Theorem 40, step (1) (basic normalization, Algorithm 8) generates 2O(n) disjuncts each of which contains O(n) distinct terms. Step (2) (cluster completion, Algorithm 12) produces 2O(nlg n) cluster completions. An equality completion is a valid product of a partition of terms on syntactic equality and a partition on terms on length equality. For a set of size n the number of distinct partitions is equal to the Bell number, B(n). An asymptotical expression for B(n) is 1 n (n)n+ 1 2 e (n)-n-1, where (n) is implicitly defined by (n) lg (n) = n [20]. Obviously B(n) is bounded by 2O(nlg n), and so is the number of cluster completions. Step (3) (Algorithm 13) is a bit more costly. Let the input to Algorithm 13 be a formula of the form (43) that induces n clusters containing variables in x. Assign to the rank r of each of these clusters a number p(r) in [1..n] such that if r1 < r2, then p(r1) < p(r2). The sum of all rank numbers is bounded by n2. Each run of step (1-2) reduces the sum by at least 1, and generates at most one new term appearing in disequalities. The total number of distinct terms after Algorithm 13 is bounded by O(n2). The number of cluster completions is therefore bounded by 2O(n2 lg n2) = 2O(n2 lg n), including the cost of Algorithm 12 and Algorithm 9, both of which may be called O(n2) times in the run of Algorithm 13. References [1] A. Armando, S. Ranise, M. Rusinowitch, Uniform derivation of decision procedures by superposition, in: Lecture Notes in Computer Science, vol. 2142, 2001, pp. 513­527 . [2] R. Backofen, A complete axiomatization of a theory with feature and arity constraints, Journal of Logical Program- ming 24 (1 and 2) (1995) 37­71. [3] N.S. Bjrner, A. Browne, M. Colón, B. Finkbeiner, Z. Manna, H.B. Sipma, T.E. Uribe, Verifying temporal properties of reactive systems: a STeP tutorial, Formal Methods in System Design 16 (3) (2000) 227­270. [4] N.S. Bjrner, Integrating decision procedures for temporal verification, Ph.D. thesis, Computer Science Department, Stanford University, 1998. [5] H. Comon, C. Delor, Equational formulae with membership constraints, Information and Computation 112 (2) (1994) 167­216. [6] H. Comon, P. Lescanne, Equational problems and disunification, Journal of Symbolic Computation 7 (1989) 371­425. [7] D.C. Cooper, Theorem proving in arithmetic without multiplication, in: Machine Intelligence, vol. 7, American Elsevier, 1972, pp. 91­99. [8] T. Zhang, Arithmetic Integration of Decision Procedures, PhD thesis, Computer Science Department, Stanford University, June 2006. [9] J. Downey, R. Sethi, R.E. Tarjan, Variations of the common subexpression problem, Journal of ACM 27 (1980) 758­771. [10] H.B. Enderton, A Mathematical Introduction to Logic, Academic Press, New York, 2001. [11] S. Feferman, R. Vaught, The first order properties of products of algebraic systems, Fundamenta Mathematicae 47 (1959) 57­103. [12] S. Ghilardi, Model-theoretic methods in combined constraint satisfiability, Journal of Automated Reasoning 33 (3-4) (2005) 221­249. [13] W. Hodges, Model Theory, Cambridge University Press, Cambridge, UK, 1993. T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 1573 [14] J. Hopcroft, J. Ullman, Introduction to Automata Theory, Langages, and Computation, Addison-Wesley Publishing Company, Reading, MA, 1979. [15] K. Korovin, A. Voronkov, A decision procedure for the existential theory of term algebras with the Knuth- Bendix ordering, in: Proceedings of the 15th Annual Symposium on Logic in Computer Science, 2000, pp.291­302. [16] K. Korovin, A. Voronkov, Knuth-Bendix constraint solving is NP-complete, in: Proceedings of 28th International Colloquium on Automata, Languages and Programming (ICALP'01), Lecture Notes in Computer Science, vol. 2076, Springer-Verlag, Berlin, 2001, pp. 979­992. [17] V. Kuncak, M. Rinard, An algorithm for deciding BAPA: Boolean algebra with Presburger arithmetic, in: R. Nieuwenhuis (Ed.), 20th International Conference on Automated Deduction (CADE'05), Lecture Notes in Computer Science, vol. 3632, Springer-Verlag, Berlin, 2005, pp. 260­277. [18] V. Kuncak, M. Rinard, The structural subtyping of non-recursive types is decidable, in: Proceedings of the 18th Annual Symposium on Logic in Computer Science, IEEE Computer Society Press, Sliver Spring, MD, 2003, pp. 96­107. [19] V. Kuncak, M. Rinard, On the theory of structural subtyping, Technical Report MIT-LCS-TR-879, Massachusetts Institute of Technology, 2003. [20] L. Lovász, Combinatorial Problems and Exercises, Elsevier, North-Holland, 1993. [21] M.J. Maher, Complete axiomatizations of the algebras of finite, rational and infinite tree, in: Proceedings of the 3rd Annual Symposium on Logic in Computer Science, IEEE Computer Society Press, Sliver Spring, MD, 1988, pp. 348­357. [22] G.S. Makanin, The problem of solvability of equations in a free semigroup, Math. Sbornik 103 (1977) 147­236. [23] A.I. Mal'cev, Axiomatizable classes of locally free algebras of various types, in: The Metamathematics of Algebraic Systems, Collected Papers, North Holland, 1971, Ch. 23, pp. 262­281. [24] G. Nelson, D.C. Oppen, Fast decision procedures based on congruence closure, Journal of ACM 27 (2) (1980) 356­364. [25] G. Nelson, D.C. Oppen, Simplification by cooperating decision procedures, ACM Transactions on Programming Languages and Systems 1 (2) (1979) 245­257. [26] D.C. Oppen, Reasoning about recursively defined data structures, Journal of ACM 27 (3) (1980) 403­411. [27] D.C. Oppen, Elementary bounds for Presburger arithmetic, in: Proceedings of 5th ACM Symposium on Theory of Computing, 1973, pp. 34­37. [28] C.R. Reddy, D.W. Loveland, Presburger arithmetic with bounded quantifier alternation, in: Proceedings of the 10th Annual Symposium on Theory of Computing, ACM Press, New York, 1978, pp. 320­325. [29] P. Revesz, Quantifier-elimination for the first-order theory of Boolean algebras with linear cardinality constraints, in: Proceedings of Advances in Databases and Information Systems (ADBIS'04), Lecture Notes in Computer Science, vol. 3255, Springer-Verlag, Berlin, 2004, pp. 1­21. [30] T. Rybina, A. Voronkov, A decision procedure for term algebras with queues, ACM Transactions on Computational Logic 2 (2) (2001) 155­181. [31] T. Skolem, Untersuchungen über die axiome des klassenkalküls und über produktations- und summationsprobleme, welche gewisse klassen von aussagen betreffen, in: J.E. Fenstad (Ed.), Selected Works in Logic, Universitetsforlaget, 1970, pp. 67­101. [32] C. Tinelli, M.T. Harandi, A new correctness proof of the Nelson­Oppen combination procedure, in: F. Baader, K.U. Schulz (Eds.), Frontiers of Combining Systems: Proceedings of the 1st International Workshop (Munich, Germany), Applied Logic, Kluwer Academic Publishers, Dordrecht, 1996, pp. 103­120. [33] C. Tinelli, C. Ringeissen, Unions of non-disjoint theories and combinations of satisfiability procedures, Theoretical Computer Science 290 (1) (2003) 291­353. [34] C.Tinelli,C.G.Zarba,Combiningdecisionproceduresforsortedtheories,in:J.J.Alferes,J.A.Leite(Eds.),Proceedings of the 9th European Conference on Logic in Artificial Intelligence (JELIA'04), Lecture Notes in Computer Science, vol. 3229, Springer, Berlin, 2004, pp. 641­653. [35] K.N. Venkataraman, Decidability of the purely existential fragment of the theory of term algebras, Journal of ACM 34 (2) (1987) 492­510. [36] C.G. Zarba, Combining sets with integers, in: A. Armando (Ed.), Frontiers of Combining Systems, Lecture Notes in Artificial Intelligence, vol. 2309, Springer, Berlin, 2002, pp. 103­116. 1574 T. Zhang et al. / Information and Computation 204 (2006) 1526­1574 [37] C.G. Zarba, Combining multisets with integers, in: A. Voronkov (Ed.), Proceedings of the 18th International Confer- ence on Automated Deduction, Lecture Notes in Artificial Intelligence, vol. 2392, Springer, Berlin, 2002, pp. 363­376. [38] T. Zhang, H.B. Sipma, Z. Manna, Decision procedures for recursive data structures with integer constraints, in: 2th International Joint Conference on Automated Reasoning (IJCAR'04), Lecture Notes in Computer Science, vol. 3097, Springer-Verlag, Berlin, 2004, pp. 152­167. [39] T. Zhang, H.B. Sipma, Z. Manna, Term algebras with length function and bounded quantifier alternation, in: The 17th International Conference on Theorem Proving in Higher Order Logics (TPHOLs'04), Lecture Notes in Computer Science, vol. 3223, Springer-Verlag, Berlin, 2004, pp. 321­336. [40] T. Zhang, H.B. Sipma, Z. Manna, The decidability of the first-order theory of term algebras with Knuth-Bendix order, in: R. Nieuwenhuis (Ed.), The 20th International Conference on Automated Deduction (CADE'05), Lecture Notes in Computer Science, vol. 3632, Springer-Verlag, Berlin, 2005, pp. 131­148. [41] T. Zhang, H.B. Sipma, Z. Manna, Decision procedures for queues with integer constraints, in: R. Ramanujam, S. Sen (Eds.), Proceedings of the 25th International Conference on the Foundatations of Software Technology and Theoretical Computer Science (FSTTCS'05), Lecture Notes in Computer Science, vol. 3821, Springer Verlag, Hyderabad, India, 2005, pp. 225­237.