Part IX Algebraic techniques - fingerprinting Chapter 9. FINGERPRINTING - ALGEBRAIC TECHNIQUES Some of the most interesting results in the design of efficient computations were obtained using algebraic techniques combined with randomization. Fingerprinting technique Example: Decide equality of two elements x, y drawn from a large universe U. Complexity under any reasonable model seems to be Ω(lg |U|). Alternative approach: Pick a random mapping f : U −→ V |U| >> |V |. such that there is a good chance that f (x) = f (y) implies x = y and declare that x = y (x = y) if f (x) = f (y) (f (x) = f (y). Complexity under any reasonable model is now Ω(lg |V |) prof. Jozef Gruska IV054 9. Algebraic techniques - fingerprinting 2/19 FINGERPRINTING METHOD - BASICS The basic idea is to find out whether two given objects are equal (given their full representations), by comparing their fingerprints (incomplete representations). An important requirement of this method is that fingerprints should always be chosen randomly from the set of potential fingerprints. Another requirement is that chosen fingerprints should preserve some essential differences between objects they represent. Third requirement fingerprints should satisfy: Fingerprints should be short, simple, and easy to obtain. prof. Jozef Gruska IV054 9. Algebraic techniques - fingerprinting 3/19 BASIC SCHEME Given a set of objects O, find a set F of fingerprints and a set M of mappings f : O ↔ F such that for any two different objects o1 and o2 from O there is a lot of such mappings f ∈ M that f (o1) = f (o2) Fingerprint method can therefore be seen as a special case of the method of abundance of the witnesses discussed in the next chapter. prof. Jozef Gruska IV054 9. Algebraic techniques - fingerprinting 4/19 FREIVALDS TECHNIQUE - I. Matrix multiplication for matrices of degree n: there is a classical (school) algorithm of complexity — O(n3) there is a sophisticated algorithm of complexity— O(n2.376) Problem: to check whether AB = C for iven n-dimensional matrices A, B and C. Method: choose a random column vector r ∈ {0, 1}n. Compute: x = Br, y = Ax, z = Cr – {O(n2) steps} - and declare AB = C iff y = z. What is the probability that conclusion is wrong? It means that: it is zero if AB = C and less then 1 2 if AB = C prof. Jozef Gruska IV054 9. Algebraic techniques - fingerprinting 5/19 FREIVALDS TECHNIQUE - II. Theorem Let A, B, C be n × n matrices, such that AB = C.Then for randomly chosen r ∈ {0, 1}n it holds that Pr[ABr = Cr] ≤ 1 2 . Proof: Denote D = AB − C = 0. We wish to upper-bound the probability that y = z (⇔ Dr = 0) Without loss of generality, we may assume that the first row in D has a non-zero element and that all its non-zero elements are first. Let d be the first row of D with the first k elements = 0. We now concentrate on the probability that Dr = 0. This will yield a lower bound on the probability that y = z. Dr = 0 iff r1 = − k i=2 di ri d1 (∗). Invoking the Principle of Deferred Decision we assume that r2, . . . , rn are chosen (randomly) before r1. Since r1 is also chosen randomly, the probability of Dr = 0 is = 1 2 . prof. Jozef Gruska IV054 9. Algebraic techniques - fingerprinting 6/19 VERIFICATION of POLYNOMIAL IDENTITIES - I. Polynomial product verification problem Given two polynomials of degree n – P1(x), P2(x) – and one of degree 2n – P3(x) – verify whether P1(x) · P2(x) = P3(x) polynomial evaluation complexity: multiplication complexity: O(n) O(n lg n) Randomized verification: Let S be a set of integers and |S| ≥ 2n + 1. Pick r ∈ S uniformly and randomly and evaluate P1(r), P2(r), P3(r). Declare the identity P1(x) · P2(x) = P3(x) as correct unless P1(r) · P2(r) = P3(r). prof. Jozef Gruska IV054 9. Algebraic techniques - fingerprinting 7/19 VERIFICATION of POLYNOMIAL IDENTITIES - II. Analysis: Polynomial Q(x) = P1(x) · P2(x) − P3(x) has degree ≤ 2n, and has therefore at most 2n roots. Unless Q(x) ≡ 0, for at most 2n random choices of r ∈ S, it holds Q(r) = 0. The probability of error of the above verification is therefore at most 2n |S| . The above verification procedure can be extended to verify any polynomial identity P1(x) = P2(x), where P1, P2 are given implicitly. Example: M = 1 x1 x2 1 . . . xn−1 1 1 x2 x2 2 . . . xn−1 2 ... 1 xn x2 n . . . xn−1 n Task: Verify that Det(M) = i n, and choose p < τ. Theorem Pr[Fp(num(a)) = Fp(num(b)) | a = b ] ≤ n π(τ) and if τ = tn ln tn, then Pr[Fp(num(a)) = Fp(num(b)) | a = b ] ≤ O( 1 t ), where π(x) = x ln x is the number of primes smaller than x. prof. Jozef Gruska IV054 9. Algebraic techniques - fingerprinting 16/19 PATTERN MATCHING Given: text X = x1, . . . , xn, pattern Y = y1, . . . , ym, both over the alphabet {0, 1}, and m < n. Find the leftmost occurrence of Y in X (if possible). Trivial solution - by exhaustive comparisons: O(nm). Sophisticated solution - by Knuth-Morris algorithm: O(n + m). We present now Monte Carlo and Las Vegas algorithms as another solutions that are based on the fingerprinting technique. Notation: X(j) = xj xj+1, . . . , xj+m−1, num(X(j)) = j+m−1 k=j xk 2j+m−k−1 . Basic idea: Interpret any m - bit string x as an integer num(x) and use as the fingerprint function Fp(x) = num(x) (mod p), where p ≤ τ – a chosen threshold. Pr[Fp(Y ) = Fp(X(j)) | Y = X(j) ] ≤ m π(τ) = O m ln τ τ Taking τ = nm ln nm we get Pr[a false match occurs] = O( 1 n ) prof. Jozef Gruska IV054 9. Algebraic techniques - fingerprinting 17/19 Monte Carlo algorithm Compare, for j = 1, . . . , n − m, fingerprints of Y and X(j), and output the smallest j at which a match occurs. Key fact for complexity analysis: The cost of computing Fp(X(j + 1)) from Fp(X(j)) is only O(1) operations. Indeed, for 1 ≤ j ≤ n − m + 1, num(X(j + 1)) = 2 num((X(j)) − 2m−1 xj + xj+m =⇒ Fp(num(X(j + 1))) = (2[num(Fp(X(j))) − 2m−1 xj ] + xj+m) mod p prof. Jozef Gruska IV054 9. Algebraic techniques - fingerprinting 18/19 Conversion into a Las Vegas algorithm Whenever a match occurs between the fingerprints of Y and X(j), we compare strings Y and X(j) in O(m) time. If this is a false match, we abandon the whole process in favor of using a brute-force (O(nm)) – algorithm. Resulting algorithm does not make any error and has as the expected running time O m + (n)(1 − 1 n ) + nm( 1 n ) = O(n + m) prof. Jozef Gruska IV054 9. Algebraic techniques - fingerprinting 19/19