IA159 Formal Methods for Software Analysis Symbolic Execution and Applications Jan Strejˇcek Faculty of Informatics Masaryk University Focus and sources focus symbolic execution automated whitebox fuzz testing sources J. C. King: Symbolic Execution and Program Testing, Communications of ACM, 1976. P. Godefroid, M. Y. Levin, and D. Molnar: Automated whitebox fuzz testing, NDSS 2008. IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 2/79 Motivation 1 int sum(int a, int b, int c) { 2 int x = a + b; 3 int y = b + c; 4 int z = x + y - b; 5 return z; 6 } testing checks that the program behaves correctly on selected inputs sum(1,1,1) returns 3 sum(1,2,3) returns 6 . . . IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 3/79 Motivation 1 int sum(int a, int b, int c) { 2 int x = a + b; 3 int y = b + c; 4 int z = x + y - b; 5 return z; 6 } testing checks that the program behaves correctly on selected inputs sum(1,1,1) returns 3 sum(1,2,3) returns 6 . . . IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 4/79 Motivation 1 int sum(int a, int b, int c) { 2 int x = a + b; 3 int y = b + c; 4 int z = x + y - b; 5 return z; 6 } we can execute the program with symbols α1, α2, α3 representing arbitrary input values sum(α1, α2, α3) returns IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 5/79 Motivation 1 int sum(int a, int b, int c) { 2 int x = a + b; 3 int y = b + c; 4 int z = x + y - b; 5 return z; 6 } we can execute the program with symbols α1, α2, α3 representing arbitrary input values sum(α1, α2, α3) returns α1 + α2 + α3 (if int interpreted as Z) → symbolic execution IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 6/79 Symbolic execution semantics in general each programming language has an execution semantics describing which data objects the program manipulates how statements manipulate data objects how control flows through the statements of a program in symbolic execution semantics real data objects are replaced by symbolic ones, which are typically expressions over symbols α1, α2, . . . representing arbitrary input values the semantics of statements is extended to accept symbolic input and produce symbolic output control flow is handled differently as some branching conditions can be evaluated to both true and false depending on the values of symbols IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 7/79 Symbolic execution assumptions and notation consider a program that handles only integer (Z) variables and it is built from assignments and branching statements a special assignment x = * (or x = input()) corresponds to reading input Vars = the set of variables in the considered program Sym = {α, β, . . . , α1, α2, . . .} = countable set of symbols representing arbirary input values Exp(Sym) = expressions over Sym, integers, and arithmetic operations for example, 2α + β3 − 7 ∈ Exp(Sym) Exp(Sym) are symbolic data objects IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 8/79 Symbolic execution assumptions and notation consider a program that handles only integer (Z) variables and it is built from assignments and branching statements a special assignment x = * (or x = input()) corresponds to reading input Vars = the set of variables in the considered program Sym = {α, β, . . . , α1, α2, . . .} = countable set of symbols representing arbirary input values Exp(Sym) = expressions over Sym, integers, and arithmetic operations for example, 2α + β3 − 7 ∈ Exp(Sym) Exp(Sym) are symbolic data objects symbolic execution computes symbolic states consisting of 1 current program location 2 symbolic memory 3 path condition IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 9/79 Symbolic memory m m: Vars → Exp(Sym) assigns expressions of Exp(Sym) to program variables a symbol α ∈ Sym is called fresh if it was not used before in the considered computation initial symbolic memory m0 assigns to each variable x ∈ Vars a fresh symbol m(x) = α ∈ Sym symbolic memory is modified by assignments for any program expression exp (Boolean or integer), by m(exp) we denote the expression where each program variable x ∈ Vars is replaced by m(x) IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 10/79 Symbolic memory m m: Vars → Exp(Sym) assigns expressions of Exp(Sym) to program variables a symbol α ∈ Sym is called fresh if it was not used before in the considered computation initial symbolic memory m0 assigns to each variable x ∈ Vars a fresh symbol m(x) = α ∈ Sym symbolic memory is modified by assignments for any program expression exp (Boolean or integer), by m(exp) we denote the expression where each program variable x ∈ Vars is replaced by m(x) example let m(x) = α + 4 and m(y) = 2α + β m(5x − y + 8) = 5(α + 4) − (2α + β) + 8 = 3α − β + 28 m(x ! = y) = (α + 4 ̸= 2α + β) = (α + β ̸= 4) IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 11/79 Symbolic execution of assignments a b x = exp; symbolic memory m IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 12/79 Symbolic execution of assignments a b x = exp; symbolic memory m symbolic memory m′ identical to m except for m′ (x) = m(exp) IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 13/79 Symbolic execution of assignments a b x = exp; symbolic memory m symbolic memory m′ identical to m except for m′ (x) = m(exp) a b x = *; symbolic memory m IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 14/79 Symbolic execution of assignments a b x = exp; symbolic memory m symbolic memory m′ identical to m except for m′ (x) = m(exp) a b x = *; symbolic memory m symbolic memory m′ identical to m except for m′ (x) = γ where γ is fresh IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 15/79 Symbolic execution of assignments a b x = exp; symbolic memory m symbolic memory m′ identical to m except for m′ (x) = m(exp) a b x = *; symbolic memory m symbolic memory m′ identical to m except for m′ (x) = γ where γ is fresh example a b c y = 5x-y+8; x = *; m(x) = α + 4 m(y) = 2α + β IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 16/79 Symbolic execution of assignments a b x = exp; symbolic memory m symbolic memory m′ identical to m except for m′ (x) = m(exp) a b x = *; symbolic memory m symbolic memory m′ identical to m except for m′ (x) = γ where γ is fresh example a b c y = 5x-y+8; x = *; m(x) = α + 4 m(y) = 2α + β m′ (x) = m(x) = α + 4 m′ (y) = m(5x − y + 8) = 3α − β + 28 IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 17/79 Symbolic execution of assignments a b x = exp; symbolic memory m symbolic memory m′ identical to m except for m′ (x) = m(exp) a b x = *; symbolic memory m symbolic memory m′ identical to m except for m′ (x) = γ where γ is fresh example a b c y = 5x-y+8; x = *; m(x) = α + 4 m(y) = 2α + β m′ (x) = m(x) = α + 4 m′ (y) = m(5x − y + 8) = 3α − β + 28 m′′ (x) = γ m′′ (y) = 3α − β + 28 IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 18/79 Path condition pc a quantifier-free predicate formula over Sym corresponding to a program path pc is the necessary and sufficient condition on input values to navige the program execution along the current path if pc is not satisfiable the corresponding path is unfeasible pc is initially set to true pc is modified by evaluation of branching statements IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 19/79 Symbolic execution of branching statements if (bexp) {...} else {...} a b c (bexp) !(bexp) symbolic memory m path condition pc IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 20/79 Symbolic execution of branching statements if (bexp) {...} else {...} a b c (bexp) !(bexp) symbolic memory m path condition pc 1 check feasability of the true branch: if pc ∧ m(bexp) is not satisfiable, continue to the false branch IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 21/79 Symbolic execution of branching statements if (bexp) {...} else {...} a b c (bexp)X !(bexp) symbolic memory m path condition pc pc ∧ m(bexp) is UNSAT symbolic memory m path condition pc 1 check feasability of the true branch: if pc ∧ m(bexp) is not satisfiable, continue to the false branch IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 22/79 Symbolic execution of branching statements if (bexp) {...} else {...} a b c (bexp) !(bexp) symbolic memory m path condition pc 1 check feasability of the true branch: if pc ∧ m(bexp) is not satisfiable, continue to the false branch 2 check feasability of the false branch: if pc ∧ ¬m(bexp) is not satisfiable, continue to the true branch IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 23/79 Symbolic execution of branching statements if (bexp) {...} else {...} a b c (bexp) !(bexp)X symbolic memory m path condition pc pc ∧ ¬m(bexp) is UNSAT symbolic memory m path condition pc 1 check feasability of the true branch: if pc ∧ m(bexp) is not satisfiable, continue to the false branch 2 check feasability of the false branch: if pc ∧ ¬m(bexp) is not satisfiable, continue to the true branch IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 24/79 Symbolic execution of branching statements if (bexp) {...} else {...} a b c (bexp) !(bexp) symbolic memory m path condition pc 1 check feasability of the true branch: if pc ∧ m(bexp) is not satisfiable, continue to the false branch 2 check feasability of the false branch: if pc ∧ ¬m(bexp) is not satisfiable, continue to the true branch 3 if both are satisfiable, fork the symbolic execution set pc in true branch to pc ∧ m(bexp) set pc in false branch to pc ∧ negm(bexp) IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 25/79 Symbolic execution of branching statements if (bexp) {...} else {...} a b c (bexp) !(bexp) symbolic memory m path condition pc pc ∧ m(bexp) and pc ∧ ¬m(bexp) are SAT symbolic memory m pc ∧ m(bexp) symbolic memory m pc ∧ ¬m(bexp) 1 check feasability of the true branch: if pc ∧ m(bexp) is not satisfiable, continue to the false branch 2 check feasability of the false branch: if pc ∧ ¬m(bexp) is not satisfiable, continue to the true branch 3 if both are satisfiable, fork the symbolic execution set pc in true branch to pc ∧ m(bexp) set pc in false branch to pc ∧ negm(bexp) IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 26/79 Symbolic execution of branching statements: example if (n>5) {...} else {...} a b c (n>5) !(n>5) m(n) = 2α + 3 pc: IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 27/79 Symbolic execution of branching statements: example if (n>5) {...} else {...} a b c (n>5) !(n>5) m(n) = 2α + 3 pc: α < 10 pc is α < 10 true branch is feasible as α < 10 ∧ m(n > 5) ≡ α < 10 ∧ 2α + 3 > 5 is satisfiable (e.g. by α = 3) false branch is feasible as α < 10 ∧ ¬m(n > 5) ≡ α < 10 ∧ 2α + 3 ≤ 5 is satisfiable (e.g. by α = 0) fork execution and update pc on both branches IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 28/79 Symbolic execution of branching statements: example if (n>5) {...} else {...} a b c (n>5) !(n>5) m(n) = 2α + 3 pc: α < 10 m(n) = 2α + 3 pc: α < 10 ∧ 2α + 3 > 5 ≡ α < 10 ∧ α > 1 m(n) = 2α + 3 pc: α < 10 ∧ 2α + 3 ≤ 5 ≡ α ≤ 1 pc is α < 10 true branch is feasible as α < 10 ∧ m(n > 5) ≡ α < 10 ∧ 2α + 3 > 5 is satisfiable (e.g. by α = 3) false branch is feasible as α < 10 ∧ ¬m(n > 5) ≡ α < 10 ∧ 2α + 3 ≤ 5 is satisfiable (e.g. by α = 0) fork execution and update pc on both branches IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 29/79 Symbolic execution of branching statements: example if (n>5) {...} else {...} a b c (n>5) !(n>5) m(n) = 2α + 3 pc: α > 10 pc is α > 10 false branch is unfeasible as α > 10 ∧ ¬m(n > 5) ≡ α > 10 ∧ 2α + 3 ≤ 5 ≡ false continue to true branch with the same pc IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 30/79 Symbolic execution of branching statements: example if (n>5) {...} else {...} a b c (n>5) !(n>5)X m(n) = 2α + 3 pc: α > 10 m(n) = 2α + 3 pc: α > 10 pc is α > 10 false branch is unfeasible as α > 10 ∧ ¬m(n > 5) ≡ α > 10 ∧ 2α + 3 ≤ 5 ≡ false continue to true branch with the same pc IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 31/79 Symbolic execution of branching statements: example if (n>5) {...} else {...} a b c (n>5) !(n>5) m(n) = 2α + 3 pc: α ≤ 0 pc is α ≤ 0 true branch is unfeasible as α ≤ 0 ∧ m(n > 5) ≡ α ≤ 0 ∧ 2α + 3 > 5 ≡ false continue to false branch with the same pc IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 32/79 Symbolic execution of branching statements: example if (n>5) {...} else {...} a b c (n>5)X !(n>5) m(n) = 2α + 3 pc: α ≤ 0 m(n) = 2α + 3 pc: α ≤ 0 pc is α ≤ 0 true branch is unfeasible as α ≤ 0 ∧ m(n > 5) ≡ α ≤ 0 ∧ 2α + 3 > 5 ≡ false continue to false branch with the same pc IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 33/79 Symbolic execution tree nodes are states of symbolic execution, i.e., triples (l, m, pc) of program location l, symbolic memory m, and path condition pc root node (l0, m0, true) consists of initial program location l0, initial symbolic memory m0 assigning fresh symbols, and initial path condition true successors are computed by symbolic execution of the assignment or branching statement corresponding to the current location only locations with branching statement can have more successors IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 34/79 Symbolic execution tree: example 1 1 int foo(int x, int y) { 2 if (x>y) { 3 y = x; 4 } 5 y = y-x; 6 if (y>7) { 7 x = *; 8 } 9 return x+y; 10 } 2 3 5 6 7 9 10 (x>y) y = x; y = y-x; (y>7) x = *; r = x+y; !(x>y) !(y>7) IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 35/79 Symbolic execution tree: example 1 2 3 5 6 7 9 10 (x>y) y = x; y = y-x; (y>7) x = *; r = x+y; !(x>y) !(y>7) 2 x→α y→β true IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 36/79 Symbolic execution tree: example 1 2 3 5 6 7 9 10 (x>y) y = x; y = y-x; (y>7) x = *; r = x+y; !(x>y) !(y>7) 2 x→α y→β true 3 x→α y→β α > β 5 x→α y→β α ≤ β IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 37/79 Symbolic execution tree: example 1 2 3 5 6 7 9 10 (x>y) y = x; y = y-x; (y>7) x = *; r = x+y; !(x>y) !(y>7) 2 x→α y→β true 3 x→α y→β α > β 5 x→α y→β α ≤ β 5 x→α y→α α > β IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 38/79 Symbolic execution tree: example 1 2 3 5 6 7 9 10 (x>y) y = x; y = y-x; (y>7) x = *; r = x+y; !(x>y) !(y>7) 2 x→α y→β true 3 x→α y→β α > β 5 x→α y→β α ≤ β 5 x→α y→α α > β 6 x→α y→0 α > β IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 39/79 Symbolic execution tree: example 1 2 3 5 6 7 9 10 (x>y) y = x; y = y-x; (y>7) x = *; r = x+y; !(x>y) !(y>7) 2 x→α y→β true 3 x→α y→β α > β 5 x→α y→β α ≤ β 5 x→α y→α α > β 6 x→α y→0 α > β 9 x→α y→0 α > β IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 40/79 Symbolic execution tree: example 1 2 3 5 6 7 9 10 (x>y) y = x; y = y-x; (y>7) x = *; r = x+y; !(x>y) !(y>7) 2 x→α y→β true 3 x→α y→β α > β 5 x→α y→β α ≤ β 5 x→α y→α α > β 6 x→α y→0 α > β 9 x→α y→0 α > β 10 x→α y→0 r→α α > β IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 41/79 Symbolic execution tree: example 1 2 3 5 6 7 9 10 (x>y) y = x; y = y-x; (y>7) x = *; r = x+y; !(x>y) !(y>7) 2 x→α y→β true 3 x→α y→β α > β 5 x→α y→β α ≤ β 5 x→α y→α α > β 6 x→α y→0 α > β 9 x→α y→0 α > β 10 x→α y→0 r→α α > β 6 x→α y→β − α α ≤ β IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 42/79 Symbolic execution tree: example 1 2 3 5 6 7 9 10 (x>y) y = x; y = y-x; (y>7) x = *; r = x+y; !(x>y) !(y>7) 2 x→α y→β true 3 x→α y→β α > β 5 x→α y→β α ≤ β 5 x→α y→α α > β 6 x→α y→0 α > β 9 x→α y→0 α > β 10 x→α y→0 r→α α > β 6 x→α y→β − α α ≤ β 7 x→α y→β − α α + 7 < β 9 x→α y→β − α α ≤ β ≤ α + 7 IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 43/79 Symbolic execution tree: example 1 2 3 5 6 7 9 10 (x>y) y = x; y = y-x; (y>7) x = *; r = x+y; !(x>y) !(y>7) 2 x→α y→β true 3 x→α y→β α > β 5 x→α y→β α ≤ β 5 x→α y→α α > β 6 x→α y→0 α > β 9 x→α y→0 α > β 10 x→α y→0 r→α α > β 6 x→α y→β − α α ≤ β 7 x→α y→β − α α + 7 < β 9 x→α y→β − α α ≤ β ≤ α + 7 9 x→γ y→β − α α + 7 < β IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 44/79 Symbolic execution tree: example 1 2 3 5 6 7 9 10 (x>y) y = x; y = y-x; (y>7) x = *; r = x+y; !(x>y) !(y>7) 2 x→α y→β true 3 x→α y→β α > β 5 x→α y→β α ≤ β 5 x→α y→α α > β 6 x→α y→0 α > β 9 x→α y→0 α > β 10 x→α y→0 r→α α > β 6 x→α y→β − α α ≤ β 7 x→α y→β − α α + 7 < β 9 x→α y→β − α α ≤ β ≤ α + 7 9 x→γ y→β − α α + 7 < β 10 x→γ y→β − α r→γ + β − α α + 7 < β IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 45/79 Symbolic execution tree: example 1 2 3 5 6 7 9 10 (x>y) y = x; y = y-x; (y>7) x = *; r = x+y; !(x>y) !(y>7) 2 x→α y→β true 3 x→α y→β α > β 5 x→α y→β α ≤ β 5 x→α y→α α > β 6 x→α y→0 α > β 9 x→α y→0 α > β 10 x→α y→0 r→α α > β 6 x→α y→β − α α ≤ β 7 x→α y→β − α α + 7 < β 9 x→α y→β − α α ≤ β ≤ α + 7 9 x→γ y→β − α α + 7 < β 10 x→γ y→β − α r→γ + β − α α + 7 < β 10 x→α y→β − α r→β α ≤ β ≤ α + 7 IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 46/79 Symbolic execution tree: example 2 1 int power(int x) { 2 int z = 1; 3 while (x>0) { 4 x = x-1; 5 z = 2*z; 6 } 7 return z; 8 } 2 3 4 5 7 8 z = 1; (x>0) x = x-1; !(x>0) r = z; z = 2*z; IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 47/79 Symbolic execution tree: example 2 2 3 4 5 7 8 z = 1; (x>0) x = x-1; !(x>0) r = z; z = 2*z; 2 x→α true IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 48/79 Symbolic execution tree: example 2 2 3 4 5 7 8 z = 1; (x>0) x = x-1; !(x>0) r = z; z = 2*z; 2 x→α true 3 x→α z→1 true IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 49/79 Symbolic execution tree: example 2 2 3 4 5 7 8 z = 1; (x>0) x = x-1; !(x>0) r = z; z = 2*z; 2 x→α true 3 x→α z→1 true 4 x→α z→1 α > 0 7 x→α z→1 α ≤ 0 IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 50/79 Symbolic execution tree: example 2 2 3 4 5 7 8 z = 1; (x>0) x = x-1; !(x>0) r = z; z = 2*z; 2 x→α true 3 x→α z→1 true 4 x→α z→1 α > 0 7 x→α z→1 α ≤ 0 8 x→α z→1 r→1 α ≤ 0 IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 51/79 Symbolic execution tree: example 2 2 3 4 5 7 8 z = 1; (x>0) x = x-1; !(x>0) r = z; z = 2*z; 2 x→α true 3 x→α z→1 true 4 x→α z→1 α > 0 7 x→α z→1 α ≤ 0 5 x→α−1 z→1 α > 0 8 x→α z→1 r→1 α ≤ 0 IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 52/79 Symbolic execution tree: example 2 2 3 4 5 7 8 z = 1; (x>0) x = x-1; !(x>0) r = z; z = 2*z; 2 x→α true 3 x→α z→1 true 4 x→α z→1 α > 0 7 x→α z→1 α ≤ 0 5 x→α−1 z→1 α > 0 8 x→α z→1 r→1 α ≤ 0 3 x→α−1 z→2 α > 0 IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 53/79 Symbolic execution tree: example 2 2 3 4 5 7 8 z = 1; (x>0) x = x-1; !(x>0) r = z; z = 2*z; 2 x→α true 3 x→α z→1 true 4 x→α z→1 α > 0 7 x→α z→1 α ≤ 0 5 x→α−1 z→1 α > 0 8 x→α z→1 r→1 α ≤ 0 3 x→α−1 z→2 α > 0 4 x→α−1 z→2 α > 1 7 x→α−1 z→2 α = 1 IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 54/79 Symbolic execution tree: example 2 2 3 4 5 7 8 z = 1; (x>0) x = x-1; !(x>0) r = z; z = 2*z; 2 x→α true 3 x→α z→1 true 4 x→α z→1 α > 0 7 x→α z→1 α ≤ 0 5 x→α−1 z→1 α > 0 8 x→α z→1 r→1 α ≤ 0 3 x→α−1 z→2 α > 0 4 x→α−1 z→2 α > 1 7 x→α−1 z→2 α = 1 8 x→α−1 z→2 r→2 α = 1 IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 55/79 Symbolic execution tree: example 2 2 3 4 5 7 8 z = 1; (x>0) x = x-1; !(x>0) r = z; z = 2*z; 2 x→α true 3 x→α z→1 true 4 x→α z→1 α > 0 7 x→α z→1 α ≤ 0 5 x→α−1 z→1 α > 0 8 x→α z→1 r→1 α ≤ 0 3 x→α−1 z→2 α > 0 4 x→α−1 z→2 α > 1 7 x→α−1 z→2 α = 1 5 x→α−2 z→2 α > 1 8 x→α−1 z→2 r→2 α = 1 IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 56/79 Symbolic execution tree: example 2 2 3 4 5 7 8 z = 1; (x>0) x = x-1; !(x>0) r = z; z = 2*z; 2 x→α true 3 x→α z→1 true 4 x→α z→1 α > 0 7 x→α z→1 α ≤ 0 5 x→α−1 z→1 α > 0 8 x→α z→1 r→1 α ≤ 0 3 x→α−1 z→2 α > 0 4 x→α−1 z→2 α > 1 7 x→α−1 z→2 α = 1 5 x→α−2 z→2 α > 1 8 x→α−1 z→2 r→2 α = 1 3 x→α−2 z→4 α > 1 IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 57/79 Symbolic execution tree: example 2 2 3 4 5 7 8 z = 1; (x>0) x = x-1; !(x>0) r = z; z = 2*z; 2 x→α true 3 x→α z→1 true 4 x→α z→1 α > 0 7 x→α z→1 α ≤ 0 5 x→α−1 z→1 α > 0 8 x→α z→1 r→1 α ≤ 0 3 x→α−1 z→2 α > 0 4 x→α−1 z→2 α > 1 7 x→α−1 z→2 α = 1 5 x→α−2 z→2 α > 1 8 x→α−1 z→2 r→2 α = 1 3 x→α−2 z→4 α > 1 4 x→α−2 z→4 α > 2 7 x→α−2 z→4 α = 2 IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 58/79 Symbolic execution tree: example 2 2 3 4 5 7 8 z = 1; (x>0) x = x-1; !(x>0) r = z; z = 2*z; 2 x→α true 3 x→α z→1 true 4 x→α z→1 α > 0 7 x→α z→1 α ≤ 0 5 x→α−1 z→1 α > 0 8 x→α z→1 r→1 α ≤ 0 3 x→α−1 z→2 α > 0 4 x→α−1 z→2 α > 1 7 x→α−1 z→2 α = 1 5 x→α−2 z→2 α > 1 8 x→α−1 z→2 r→2 α = 1 3 x→α−2 z→4 α > 1 4 x→α−2 z→4 α > 2 7 x→α−2 z→4 α = 2 8 x→α−2 z→4 r→4 α = 2 IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 59/79 Symbolic execution tree: example 2 2 3 4 5 7 8 z = 1; (x>0) x = x-1; !(x>0) r = z; z = 2*z; 2 x→α true 3 x→α z→1 true 4 x→α z→1 α > 0 7 x→α z→1 α ≤ 0 5 x→α−1 z→1 α > 0 8 x→α z→1 r→1 α ≤ 0 3 x→α−1 z→2 α > 0 4 x→α−1 z→2 α > 1 7 x→α−1 z→2 α = 1 5 x→α−2 z→2 α > 1 8 x→α−1 z→2 r→2 α = 1 3 x→α−2 z→4 α > 1 4 x→α−2 z→4 α > 2 7 x→α−2 z→4 α = 2 5 x→α−3 z→4 α > 2 8 x→α−2 z→4 r→4 α = 2 IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 60/79 Symbolic execution tree: example 2 2 3 4 5 7 8 z = 1; (x>0) x = x-1; !(x>0) r = z; z = 2*z; 2 x→α true 3 x→α z→1 true 4 x→α z→1 α > 0 7 x→α z→1 α ≤ 0 5 x→α−1 z→1 α > 0 8 x→α z→1 r→1 α ≤ 0 3 x→α−1 z→2 α > 0 4 x→α−1 z→2 α > 1 7 x→α−1 z→2 α = 1 5 x→α−2 z→2 α > 1 8 x→α−1 z→2 r→2 α = 1 3 x→α−2 z→4 α > 1 4 x→α−2 z→4 α > 2 7 x→α−2 z→4 α = 2 5 x→α−3 z→4 α > 2 8 x→α−2 z→4 r→4 α = 2 3 x→α−3 z→8 α > 2 ... IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 61/79 Properties of symbolic execution there is a bijection between paths in the symbolic execution tree (starting in its root) and feasible execution paths of the program the path condition gives the necessary and sufficient condition on input values to drive the execution along the corresponding path IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 62/79 Properties of symbolic execution there is a bijection between paths in the symbolic execution tree (starting in its root) and feasible execution paths of the program the path condition gives the necessary and sufficient condition on input values to drive the execution along the corresponding path in each symbolic execution, the path condition is satisfiable initially, pc is set to true pc is changed only when both branches of a branching statements are feasible pc is extended with a conjunct corresponding to the corresponding branch and the new conjunction is satisfiable as the branch is feasible path conditions pc1, pc2 corresponding to two distinct leaves of the symbolic execution tree are mutually exclusiove, i.e., pc1 ∧ pc2 ≡ false if the symbolic executon tree is finite, then the disjunction of all path conditions in its leaves is equivalent to true IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 63/79 Applications in verification Programs can be enriched with assume(φ) and assert(φ) statements. When symbolic execution passes through assume(φ), it executes pc ← pc ∧ φ. assert(φ) and pc =⇒ φ is not valid, it reports an error. With these constructs, symbolic execution can be used with a modification of Floyd’s proof method to prove program correctness. This application is straightforward for any program whose symbolic execution tree is finite. IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 64/79 Practical issues deciding validity or satisfiability of formulas can be expensive or even impossible (e.g. for our simple language with unbounded data types) in practice, symbolic execution uses expressions and formulas over bitvector theory (operations and relations correspond to CPU instructions, e.g. artihmetic operations with overflows, bitwise operations, etc.), where validity and satisfiability are decidable (but expensive) IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 65/79 Practical issues variable storage referencing problem when i is dependent on input, then A[i] can point to various locations in memory unsatisfactory solution: handle A[i] as ITE(i = 1, A[1], ITE(i = 2, A[2], . . .)) IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 66/79 Practical issues variable storage referencing problem when i is dependent on input, then A[i] can point to various locations in memory unsatisfactory solution: handle A[i] as ITE(i = 1, A[1], ITE(i = 2, A[2], . . .)) other memory related problems reading/writing via pointers comparison of addresses (inner program nondeterminism) allocation of memory blocks of symbolic size IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 67/79 Practical issues variable storage referencing problem when i is dependent on input, then A[i] can point to various locations in memory unsatisfactory solution: handle A[i] as ITE(i = 1, A[1], ITE(i = 2, A[2], . . .)) other memory related problems reading/writing via pointers comparison of addresses (inner program nondeterminism) allocation of memory blocks of symbolic size solution: fully symbolic memory model performance issues IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 68/79 Practical issues path explosion problem the number of branches in the symbolic execution tree can be extremely high or even infinite typical for program cycles with the number of iterations depending on the input (symbolic execution forks again and again) construction of full symbolic execution tree is often infeasible issues with complex arithmetic operations (e.g. in hashing, encryption or decryption), calls to the operating system and libraries practical solutions concretization concolic execution IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 69/79 Concolic execution concolic = concrete + symbolic program is executed on a real input and on symbolic input simultaneously symbolic execution does not fork, it always follows the concrete execution and computes pc if a symbolic value is not available, we can switch to a concrete one IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 70/79 Real applications typical applications bug finding test generation analysis of abstract error traces often combined with other techniques used in many tools including Klee, PEX, SAGE, SLAM, Ultimate Automizer, Symbiotic IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 71/79 Automated whitebox fuzz testing Automated whitebox fuzz testing an example of modern and sophisticated testing method implemented in SAGE (Scalable, Automated, Guided Execution) discovered 30+ new bugs in large-shipped (and thus intensively tested) file-reading Windows applications including image processors, media players, file decoders combines fuzz testing and symbolic execution IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 73/79 Key ideas symbolic execution is expensive compared to running tests thus we want to generate as many new inputs from one symbolic execution as possible input for the next symbolic execution is selected by some scoring function applied to all generated inputs in particular, the input that explored the most (so-far uncovered) pieces of code is chosen for the next symbolic execution IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 74/79 The main algorithm 1 procedure GenerateInputs(inputSeed) 2 inputSeed.bound ← 0 3 workList ← {inputSeed} 4 Run&Check(program, inputSeed) 5 while workList ̸= ∅ do 6 input ← PickFirstItem(workList) 7 childInputs ← ExpandExecution(input) 8 foreach newInput ∈ childInputs do 9 Run&Check(program, newInput) 10 Score(newInput) 11 workList ← workList ∪ {newInput} Score(newInput) counts the newly covered blocks workList is ordered by the score of inputs PickFirstItem(workList) returns the input with the highest score IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 75/79 Application of symbolic execution 1 procedure ExpandExecution(input) 2 childInputs ← ∅ 3 PC ← SymbolicExecution(program, input) 4 for j ← input.bound to |PC| − 1 do 5 if j−1 i=0 PC[i] ∧ ¬PC[j] has solution M then 6 newInput ← Combine(input, M) 7 newInput.bound ← j 8 childInputs ← childInputs ∪ {newInput} 9 return childInputs Combine(input, M) creates a new input from the original input and M Combine("abcde", input[3] → "F") returns "abcFe" path conditions are represented as arrays PC of conjuncts IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 76/79 Example 1 void top(char input[4]) { 2 int cnt=0; 3 if (input[0] == ’b’) cnt++; 4 if (input[1] == ’a’) cnt++; 5 if (input[2] == ’d’) cnt++; 6 if (input[3] == ’!’) cnt++; 7 if (cnt >= 3) abort(); // error 8 } IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 77/79 Example 1 void top(char input[4]) { 2 int cnt=0; 3 if (input[0] == ’b’) cnt++; 4 if (input[1] == ’a’) cnt++; 5 if (input[2] == ’d’) cnt++; 6 if (input[3] == ’!’) cnt++; 7 if (cnt >= 3) abort(); // error 8 } 0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 good goo! godd god! gaod gao! gadd gad! bood boo! bodd bod! baod bao! badd bad! IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 78/79 Notes the algorithm can be parallelized: only workList and the overall block coverage need to be shared SAGE recovers easily from divergencies (situations when an execution deviates from the assumed execution path) induced e.g. by inner program nondeterminism SAGE runs 24/7 on large clusters, available for Microsoft developers IA159 Formal Methods for Software Analysis: Symbolic Execution and Applications 79/79