IA159 Formal Methods for Software Analysis Shape Analysis via 3-Valued Logic Jan Strejˇcek Faculty of Informatics Masaryk University Focus and sources focus shape analysis in general 3-valued logic approach the logic and shape graphs algorithm TVLA and (semi)demo other approaches sources M. Sagiv, T. Reps, R. Wilhelm: Parametric Shape Analysis via 3-Valued Logic, ACM Trans. Program. Lang. Syst. 24(3), 2002. B. Jeannet, A. Loginov, T. Reps, M. Sagiv: A Relational Approach to Interprocedural Shape Analysis, SAS 2004. IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 2/47 Goal Shape analysis is a static analysis focused on program properties related to dynamically allocated memory. In particular, it aims to detect or verify the absence of heap-specific errors like null dereference memory leaking dangling pointer – a pointer to a deallocated memory violation of expected properties of dynamic datastructures (e.g., the datastructure is a cyclic list) . . . IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 3/47 Basic idea For each program location, we want to compute all reachable memory configurations. list1 list2 x NULL IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 4/47 Realistic approach The number of reachable memory configurations can be very large or even unbounded. We need to find finite representations of potentially infinite sets of memory configurations. We compute over-approximations of sets of reachable memory configurations (an abstraction). The over-approximations are represented by finite shape graphs. Shape graphs can be represented using logics, graph structures, automata, . . . IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 5/47 Representing concrete memory configurations with 2-valued logical structures Logical representation of concrete configurations configurations are represented by predicate logic formulas over the following core predicates: unary predicate x(v) for each pointer variable x binary predicate n(v1, v2) for each structure field n serving as a pointer binary predicate eq(v1, v2) predicate intended meaning x(v) variable x points to memory cell v n(v1, v2) field n of v1 (i.e., v1.n) points to v2 eq(v1, v2) v1 and v2 denote the same memory cell memory configurations correspond to interpretations allocated memory cells correspond to domain elements IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 7/47 Example typedef struct node { struct node *n; int data; } *List; x y NULL 5 3 11 8 IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 8/47 Example x y NULL 5 3 11 8 logical representation domain {u1, u2, u3, u4} x(u1) = y(u1) = 1 n(u1, u2) = n(u2, u3) = n(u3, u4) = 1 eq(u1, u1) = eq(u2, u2) = eq(u3, u3) = eq(u4, u4) = 1 values of all predicates on other arguments is 0 visualisation of the logical representation u1 u2 u3 u4 x y n n n IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 9/47 Notes storeless approach – it does not model precise location of allocated cells in the memory it cannot handle pointer arithmetics some interpretations do not represent any memory configuration, e.g., if n(u, v) = n(u, w) = 1 for some v ̸= w these interpretations are eliminated by formulas called integrity constraints, e.g., n(u, v) ∧ n(u, w) =⇒ eq(v, w) the size of a configuration (and its logical representation) can be unbounded −→ we use an abstraction to get a less precise, but bounded representation IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 10/47 3-valued logical structures and shape graphs 3-valued logic 1/2 0 1 uses 3 truth values: 0, 1, 1/2 (indefinite value) new operation the least upper bound ⊔ operations ∧, ∨, ¬ are extended ⊔ 0 1 1/2 0 0 1/2 1/2 1 1/2 1 1/2 1/2 1/2 1/2 1/2 ∧ 0 1 1/2 0 0 0 0 1 0 1 1/2 1/2 0 1/2 1/2 ∨ 0 1 1/2 0 0 1 1/2 1 1 1 1 1/2 1/2 1 1/2 ¬ 0 1 1 0 1/2 1/2 IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 12/47 Abstraction of concrete configurations abstraction we merge cells with identical values of all unary predicates values of unary predicates on merged cells keep unchanged (these are always 0 or 1) values of binary predicates on merged cells are defined as the least upper bound of the values on the original cells Example: if u2, u3, u4 is merged into u′ and u1 is not, then n(u1, u′ ) = n(u1, u2) ⊔ n(u1, u3) ⊔ n(u1, u4) IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 13/47 Example u1 u2 u3 u4 x y n n n let u2, u3, and u4 be merged into u′ u1 u′ x y IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 14/47 Example u1 u2 u3 u4 x y n n n let u2, u3, and u4 be merged into u′ eq(u′, u′) = eq(u2, u2) ⊔ eq(u2, u3) ⊔ . . . ⊔ eq(u4, u4) = 1/2 cells with eq(u, u) = 1/2 are called summary nodes u1 u′ x y IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 15/47 Example u1 u2 u3 u4 x y n n n let u2, u3, and u4 be merged into u′ eq(u′, u′) = eq(u2, u2) ⊔ eq(u2, u3) ⊔ . . . ⊔ eq(u4, u4) = 1/2 cells with eq(u, u) = 1/2 are called summary nodes n(u1, u′) = 1/2 and n(u′, u′) = 1/2 u1 u′ x y n n IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 16/47 Shape graph interpretation u1 u′ x y n n This shape graph may represent: an acyclic list of 2+ elements pointed by x and y a cyclic list of 2+ elements pointed by x and y, with the first element not lying on the cycle besides of these, u′ can also represent another cyclic or acyclic lists not pointed by anything (i.e., garbage) To refine the abstraction, we add instrumentation predicates. IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 17/47 Instrumentation predicates instrumentation predicates are defined by first-order formulas over core predicates may also use transitive (or reflexive and transitive) closures of binary predicates typical instrumentation predicates for linked lists predicate meaning definition t[n](v1, v2) v2 is reachable from v1 via n-fields n∗(v1, v2) r[n, x](v) v is reachable from variable x via n-fields ∃v1 . x(v1) ∧ t[n](v1, v) c[n](v) v lies on a cycle of n-fields ∃v1 . n(v, v1) ∧ t[n](v1, v) IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 18/47 Example we add instrumentation predicates r[n, x] a c[n] u1 r[n, x] u2 r[n, x] u3 r[n, x] u4 r[n, x] x y n n n there are more unary predicates determining cell merging u1 r[n, x] u′ r[n, x] x y n n IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 19/47 Example u1 r[n, x] u′ r[n, x] x y n n Now it represents exactly all acyclic lists of 2+ elements: all nodes satisfy r[n, x], hence they are reachable from x (i.e., there is no garbage) c[n] does not hold in any node, hence the list is acyclic The choice of instrumentation predicates is crucial for obtaining some useful output. IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 20/47 Example Compute the shape graph given by core predicates and instrumentation predicates r[n, x], r[n, y]: u1 u2 u3 u4 u5 v1 x y n n n n n IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 21/47 Example Compute the shape graph given by core predicates and instrumentation predicates r[n, x], r[n, y]: u1 u2 u3 u4 u5 v1 x y n n n n n Decide whether the shape graph represents also the configuration below. u1 u2 u3 u4 u5 u6 v1 x y n n n n n n IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 22/47 Example Compute the shape graph given by core predicates and instrumentation predicates r[n, x], r[n, y]: u1 u2 u3 u4 u5 v1 x y n n n n n Decide whether the shape graph represents also the configuration below. u1 u2 u3 u4 u5 u6 v1 x y n n n n n n Suggest an instrumentation predicate that would make shape graphs for the two configurations different. IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 23/47 Example Compute the shape graph given by core predicates and instrumentation predicates r[n, x], r[n, y]: u1 u2 u3 u4 u5 v1 x y n n n n n Decide whether the shape graph represents also the configuration below. u1 u2 u3 u4 u5 u6 v1 x y n n n n n n Suggest an instrumentation predicate that would make shape graphs for the two configurations different. Solution: is[n](v) defined by ∃v1, v2 . n(v1, v) ∧ n(v2, v) ∧ v1 ̸= v2 IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 24/47 Algorithm: the first look Algorithm: the first look there are only finitely many different shape graphs for a fixed finite set of core and instrumentation predicates the algorithm is a standard abstract interpretation algorithm input: a program and shape graphs describing possible initial memory configurations 1 assign the input shape graphs to the initial program location 2 repeat 3 foreach program statement do 4 take the shape graphs assigned to the location before the statement 5 and update shape graphs in the locations after the statement 6 until a fixpoint is reached IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 26/47 Step 2 for each core predicate and each program statement, there is a predicate-update formula describing the values of the predicate after the statement using the values of core predicates before the statement using the predicate-update formulae, it is easy to compute the effect of the statement on concrete memory configurations to compute the effect of a statement on shape graphs is harder: values of instrumentation predicates are given by their definition formulas and values of core predicates, but this approach would quickly lead to loss of precision (values 1/2) to get better results, we define also predicate-update formulas for instrumentation predicates, which may use values of both core and instrumentation predicates before the statement IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 27/47 TVLA and (semi)demo TVLA = Three Valued Logic Analysis Engine developed at Tel Aviv University under supervision of Mooly Sagiv written in Java currently in version 3 (extended with heap decomposition) available for academic purposes http://www.cs.tau.ac.il/~tvla/ IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 29/47 Input program has to be specified in four parts 1 declaration of predicates and integrity constraints core predicates are just declared instrumentation predicates have to be defined by formulas 2 operation semantics of all program statements for each statement used in the program, the corresponding predicate-update formulas have to be given each statement can be accompanied by an error detection formula (e.g., null dereference) 3 program flowgraph (including asserts) 4 the list of locations for which we want to get all reachable shape graphs parts 1 and 2 can be used repeatedly and they are available for certain classes of programs (e.g., for programs manipulating linked lists or trees) part 4 is optional IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 30/47 Input initial shape graphs described using a simple text format IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 31/47 Execution and output tvla output file contains picture of the program flowgraph reachable shape graphs for specified locations potential error messages IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 32/47 Example typedef struct node { struct node *n; int data; } *List; List reverse(List x) { List y, t; y = NULL; while (x != NULL) { t = x->n; x->n = y; y = x; x = t; } return y; } IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 33/47 Example typedef struct node { struct node *n; int data; } *List; List reverse(List x) { List y, t; y = NULL; while (x != NULL) { t = x->n; x->n = y; y = x; x = t; } return y; } (SEMI)DEMO IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 34/47 Algorithm: a closer look Computing the effect of a statement on a shape graph 1 operation Focus 2 evaluation of statement guards 3 computing new values of predicates 4 operation Coerce 5 operation Blur We will compute the effect of t = x->n on the shape graph: u1 r[n, x] v r[n, x] x n n IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 36/47 Operation Focus applied on statements with defined focus formula, which is a formula with exactly one free variable operation Focus takes the shape graph and returns the set of shape graphs representing the same configurations and such that the focus formula is not evaluated to 1/2 on any node of any of the graphs. operation Focus modifies only values of predicates in the focus formula, values of other predicates are not recomputed hence, some resulting graphs may not satisfy integrity constraints IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 37/47 Operation Focus: example focus formula for t = x->n is f(w) = ∃v1 . x(v1) ∧ n(v1, w) formula ensures that after the statement, the predicate t(v) cannot have value 1/2 input output u r[n, x] v r[n, x] n n x u r[n, x] v r[n, x] n x u r[n, x] v r[n, x] n n x u r[n, x] v1 r[n, x] n n x v0 r[n, x] n n IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 38/47 Evaluation of statement guards for each statement, there can be defined a guard, which is again a formula the statements is not performed on the shape graphs for which the guard evaluates to 0 it is typically used to handle program branching statement t = x->n has no guard IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 39/47 Computing new values of predicates we use predicate-update formulas corresponding to the statement to compute new predicate values predicates with no predicate-update formulas keep their value IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 40/47 Computing new values of predicates: example Predicate-update formulas for t = x->n predicate predicate-update formula t(v) ∃v1 . x(v1) ∧ n(v1, v) r[n, t](v) r[n, x](v) ∧ (c[n](v) ∨ ¬x(v)) output u r[n, x] v r[n, x] n x u r[n, x] v r[n, x] r[n, t] n n x t u r[n, x] v1 r[n, x] r[n, t] n n x t v0 r[n, x] r[n, t] n n IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 41/47 Operation Coerce removes shape graphs not satisfying integrity constraints makes values of some predicates more precise IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 42/47 Operation Coerce: example shape graph on the left is corrupted as r[n, x](v) cannot hold =⇒ the graph is removed in the shape graph in the middle, v cannot be a summary node as t(v) holds on the right, v1 cannot be a summary node for the same reason, and moreover c[n](v1) does not hold and thus n(v1, v1), n(v0, v1) cannot hold u r[n, x] v r[n, x] r[n, t] n x t u r[n, x] v1 r[n, x] r[n, t] n x t v0 r[n, x] r[n, t] n n IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 43/47 Operation Blur can further merge nodes with same values of unary predicates consequently, some shape graphs can become identical in our example, Blur has no effect IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 44/47 Notes TVLA works automatically, but the user has to provide semantics of program statements select/supply suitable instrumentation predicates process the results and filter out false alarms studied extensions and applications interprocedural shape analysis (can handle also recursive programs) lazy shape analysis shape analysis and CEGAR shape analysis for parallel processes mix of shape analysis and data-related abstract interpretation (can be used e.g., to prove that sorting algorithms output sorted linked lists) can be used also to analyse liveness of java objects and their timely deallocation . . . IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 45/47 Other approaches and tools for shape analysis Other approaches and tools Other approaches to analysis of dynamically allocated memory are based on separation logic and (bi-)abduction (Infer) translation to first-order logic and automated theorem proving (HAVOC) symbolic memory graphs (Predator) tree automata (Forester) . . . IA159 Formal Methods for Software Analysis: Shape Analysis via 3-Valued Logic 47/47