Part I 2. Types and Basic Design Methods for Randomized Algorithms Chapter 2. DESIGN METHODS for RANDOMIZED ALGORITHMS In this chapter: 1 Main types of randomized algorithms are introduced and illustrated. 2 Main design methods for randomized algorithms are introduced and illustrated. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 2/50 BASIC PROBABILITIES FOR RANDOMIZED ALGORITHMS Let us consider any decision problem P. For any randomized algorithm A for P, and for any input x of A, let the set SA,x of all runs of A on x be the main sample space for the analysis of A on the input x. If one chooses the random variable VA,x that assigns 1 (0) to any run of A on x with the correct (wrong) output, then the expectation value of VA,x is exactly the success probability of A on x. The probability of the complementary event is called the error probability of A on x. If one takes a random variable that assigns to every computation its complexity (number of steps), then the expectation of this random variable equals the expected time complexity of A on x. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 3/50 CLASSIFICATION of RANDOMIZED ALGORITHMS As Las Vegas algorithms are called those algorithms that never produce wrong outcomes, though sometimes they may produce the outcome “I don’t know” (usually denoted as ??). (However, with bounded probability only.) A one-sided-error Monte Carlo algorithm (1MC), for a language L, is an algorithm that accepts with probability at least 1 2 any input x ∈ L and rejects for sure any input not in L; (The error probability of such algorithms converges to 0 with exponential speed if a number of independent runs are executed.) A bounded-error Monte Carlo algorithm (2MC) A, for a function F, is an algorithm for which there exists a constant ε > 0 such that, for any input x, the algorithm computes the correct output A(x) = F(x), with probability at least 1/2 + ε. (The error probability of such an algorithm can be reduced to an arbitrarily small given constant δ using only constantly many (depending on δ) runs.) A (unbounded error) Monte Carlo algorithm (UMC) is an algorithm that, for any input x, computes the correct output with probability at least 1/2. (To reduce the error probability of such an algorithm below a given constant, exponentially many runs may be needed.) IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 4/50 RANDOMIZED ALGORITHMS DESIGN PARADIGMS - I Fooling the adversary (balamuting nepriatela) method: One finds for a given problem P a set of deterministic algorithms for P such that, for each input most of these algorithms compute correct results of P efficiently and one then takes as the randomized algorithm for P a probability distribution over such a set of deterministic algorithms for P. The idea is to overcome such a situation when a set S of deterministic algorithms for P is given such that for each of them there exist bad inputs (for which the algorithm gives wrong result or computes inefficiently) and for each input there exist a lot of algorithms in S giving a correct answer efficiently). IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 5/50 RANDOMIZED ALGORITHMS DESIGN PARADIGMS - II Abundance of witnesses technique. A witness y for an input x and problem P is information with which one can solve the problem P for input x more efficiently than without it. If one finds a set S such that at least half of its elements are witnesses for P and x, then a random choice of an element in the set S leads to a witness for P and x, with probability at least 1 2 . IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 6/50 RANDOMIZED ALGORITHMS DESIGN PARADIGMS - III Fingerprinting. For solving various problems it can be much more efficient to work with very small fingerprints (hashes) of large inputs than with such large inputs directly. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 7/50 RANDOMIZED ALGORITHMS DESIGN PARADIGMS - IV Random sampling. If there are in a set sufficiently many objects we are looking for, then a random sampling in that set can provide such an object with sufficiently high probability. This method is also called probabilistic method. colorred Random rounding.: A hard to solve optimization problem P is transferred to an easy to solve optimization problem P0, just by increasing the size of the solution space, in such a way that the outcomes of any solution of the new problem can be used to create an efficient randomized algorithm to solve the original problem P. An amplification of the success probability of a randomized algorithm can be achieved by repeating independent computations on the same input (but, of course, with different random auxiliary inputs). IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 8/50 Example for random rounding A {0, 1}-problem P, in which the task is to find a proper assignment of the values from the set {0, 1} to each variable, is transferred to a [0, 1]-problem in which the task is to assign to each variable x a number nx ∈ [0, 1] such that to solve the problem P with high probability the value 1 is assigned with the probability nx. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 9/50 STRINGS EQUALITY PROBLEM Notation: For a binary vector/string x = x1x2 . . . xn let Number(x) = n i=1 xi 2n−i . Problem: Let each of the two parties, say A and B have a one n-bit string. Both parties have to decide, by communication only, whether their strings are equal. How to do that efficiently? Each deterministic protocol clearly requires sending at least n bits in the worth case. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 10/50 STRINGS EQUALITY PROBLEM - RANDOMIZED PROTOCOL Initial situation. Party A has a binary string x = x1x2 . . . xn. Party B has a binary string y = y1y2 . . . yn. Protocol: 1 Alice chooses, randomly, a prime p ≤ n2 and sends to Bob p and the binary representation of the number s = Number(x) mod p; 2 Bob computes t = Number(y) mod p and declares that x = y iff s = t. Analysis: The protocol requires to send at most 2 lg n2 ≤ 4 lg n bits. Example: For n = 264 ≈ 1021 the protocol requires to send at most 256 bits. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 11/50 ERROR ANALYSIS Let us say that a prime 2 < p < n2 is bad for a pair (x, y), x = y, if the above protocol for such an input pair (x, y) and such a choice of a prime yields a wrong answer. Notation Prim(m) = number of primes smaller than m. Error probability for an input (x, y) is number of bad primes for (x, y) Prim(n2) . Since, by Prime number theorem, Prim(m) > m/ ln m for m > 69, we have that for n ≥ 9 Prim(n2 ) > n2 2 ln n . The so-called Prime number theorem says that there are approximately n ln n primes among the first n integers. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 12/50 NUMBER Of BAD PRIMES Lemma Number of bad primes for x = y is at most n − 1. Proof. A prime p is clearly bad for the pair (x, y), x = y, if and only if p divides the number w = |Number(x) − Number(y)| < 2n Observe that w can be uniquely factorized as w = pe1 1 pe2 2 . . . p ek k where p1 < p2 < . . . < pk are primes. We can show that k ≤ n − 1. Indeed, if k ≥ n, then w ≥ 1 · 2 · 3 . . . n = n! > 2n what is a contradiction. Probability of the error of the above equality protocol is therefore: n − 1 Prim(n2) ≤ n − 1 n2/ ln n2 ≤ ln n2 n = 2 ln n n which is at most 0.369 · 10−14 in case n = 1016 . IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 13/50 PROBABILITY AMPLIFICATION In case the above protocol is repeated 10 times, each time with a different prime, and the answer is x = y each time, then the probability of an error is at most ln n2 n 10 and therefore, for n = 1016 , the error probability is at most 0.47 · 10−141 . Reminder: Probability of the correct output is at least 1 − 2 ln n n . IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 14/50 A LAS VEGAS ALGORITHM WITH ?? - EXAMPLE Let Alice have 10 n bit strings xi ∈ {0.1}n , 1 ≤ i ≤ 10, n > 200 and Bob have 10 n bit strings yi ∈ {0, 1}n , 1 ≤ i ≤ 10. The task for them is to determine, by communication, whether there exists an i0 such that xi0 = yi0 . One can show that each deterministic protocol for this problem has to exchange in the worth case 10n bits. Therefore no deterministic algorithm is essentially more efficient than sending all bits from Alice to Bob and then let Bob to make comparisons of pairs xi and yi until (if at all) such an i0 is found.. We show the existence of a Las Vegas algorithm to solve the above problem (that is to decide whether such an i exists) with communication complexity n + O(lg n). IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 15/50 PROTOCOL Alice first randomly chooses 10 primes p1, . . . , p10, each smaller than n2 . Afterwards, Alice computes all si = Number(xi ) mod pi , i ∈ {1, .2, . . . , 10} and, finally, she sends to Bob all 20 numbers: p1, . . . , p10, s1, . . . , s10. Bob computes all numbers ri = Number(yi ) mod pi and compares elements in all pairs (si , ri ). If si = ri for all i, then Bob outputs NO (or 0). Otherwise, if j is the smallest integer such that sj = rj , then Bob sends to Alice the pair (j, yj ). If xj = yj , Alice outputs YES (or 1); otherwise she outputs ?? (because it may exist other k with xk = yk ). IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 16/50 COMPLEXITY ANALYSIS of the PROTOCOL 1/2 Above protocol exchanges 20 lg n2 + n + 4 bits - {4 bits one needs to send j}. Proof that the protocol is a Las Vegas protocol: If xi = yi for all i, then the probability that Number(xi ) mod pi = Number(yi ) mod pi for all i, and therefore the probability that the above protocol produces the correct output is at least 1 − 2 ln n n 10 because, according to the analysis of the strings equality algorithm, the probability that for any 1 ≤ i ≤ 10 Number(xi ) mod pi = Number(yi ) mod pi is at least 1 − 2 ln n n . Moreover, it can be shown that for for sufficiently large n. 1 − 2 ln n n 10 ≥ 1 − 20 ln n n ≥ 1 2 . IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 17/50 COMPLEXITY ANALYSIS of the PROTOCOL 2/2 Let us now consider the complementary case - namely that there is a j such that xj = yj and let j0 be the smallest such j. Protocol then accepts the input iff Number(xi ) mod pi = Number(xi ) mod pi for all i < j0. Let us denote such an event by Ej0 . If j0 = 1, then the protocol accepts the input with certainty. If j0 > 1, then, as discussed before, probability of the event Ej0 is at least 1 − 2 ln n n j0−1 ≥ 1 − 2(j0 − 1) ln n n for sufficiently large n and therefore the protocol outputs YES (1) with probability at least 1 − 18 ln n n , which is larger than 1 2 for all n ≥ 189. In the again complementary case, when there is an l < j0 such that Number(xl ) mod pl = Number(xl ) mod pl the protocol produces as output ?? and is indeed Las vegaas protocol. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 18/50 TWO TYPES of LAS VEGAS ALGORITHMS As already defines, there are two types of Las Vegas algorithms: Algorithms that never produce ??. Algorithms that may produce ??. Note: Las Vegas algorithms may not terminate for some inputs!!! Claim: Any Las Vegas algorithm A1 can be converted to a Las Vegas algorithm A2 that solves the same problem and never produces ??. Construction of A2 is simple. Each time A1 is to produce the output ??, what can be done only with bounded probability, a new run of A1 is initialized with the same input. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 19/50 AMPLIFICATION of 1MC ALGORITHMS Error probability of 1MC algorithms decreases exponentially with the number of repetitions of computations. Indeed, if we have k independent runs of the algorithm and one output is 1 (accepted), then the input is accepted with certainty. If all outputs are 0 (rejection), then the output will be NO (rejection) and the error probability is at most 1 2 k . Because of the exponential decrease of error probability using repeated applications, 1MC algorithms are very popular. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 20/50 AMPLIFICATION of 2MC ALGORITHMS Let A be a 2MC algorithm for a function F and ε > 0 such that Prob(A(x) = F(x)) ≥ 1 2 + ε. For any integer k let Ak be the algorithm that performs k independent runs of A and if there is an α that appears at least k 2 times as the output, then Ak produces α as the output; if there is no such an α, Ak produces ?? as the output. One can show that if an δ > 0 is fixed, then Prob(Ak (x) = F(x)) ≥ 1 − δ if k ≥ 2 ln δ ln(1 − 4ε2) . If ε and δ are considered as constant, then so is k and therefore TimeAk (n) = O(TimeA(n)). IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 21/50 2MC versus UMC algorithms Basic question: What is the difference between 2MC and UMC algorithms? Answer: For an UMC algorithm A it may happen that the distance between the error probability and 1 2 tends to 0 with growing input size. As a consequence, if we design, given an δ > 0, an algorithm Ak , that performs k independent runs of A and Prob(Ak (x)) = F(x)) > 1 − δ then running time of Ak may be exponential in the input length. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 22/50 SECRET SHARING between TWO Problem: The task is to ”partition”, by a moderator, a given binary-string secret S between two parties P1 and P2 in such a way that none of the parties alone has the slightest idea what S is, but if they get together they can easily determine S. Method: A moderator distributes a binary-string secret S, between two parties P1 and P2 by choosing a random binary string b, of the same length as S, and sends: b to P1 and S ⊕ b to P2. This way, none of the parties P1 and P2 alone has a slightest idea about S, but both together easily recover S by computing b ⊕ (S ⊕ b) = S. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 23/50 PERFECT MATCHING ALGORITHM - I. Let G = V , E be an undirected graph. A subset X ⊆ E is said to be a matching of G if no two edges in X have a common node. A matching is said to be a perfect matching if its edges cover all nodes of G. Example: Which of the graphs in the next figure has a perfect matching? There are polynomial time algorithms to decide whether a given graph has perfect matching, but none is so simple as the randomized algorithm based on so called Tutte theorem presented bellow. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 24/50 TUTTE MATRIX Basic concept: Tutte matrix of a graph Let G = V , E be a graph with nodes V = {1, 2, . . . , n}, |V | even. The Tutte matrix AG = {aij }n i,j=1 of G is defined by aij =    xij if (i, j) ∈ E, i < j; −xij if (i, j) ∈ E, i > j; 0 if (i, j) ∈ E    , where xij are variables, all different for different i, j. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 25/50 TUTTE THEOREM - I. Example: For the graph of six nodes at which node 1 is connected with nodes 4 and 5, node 2 with nodes 5 and 6 and the node 3 is connected with nodes 5 and 6 the Tutte matrix has the form:         0 0 0 x14 x15 0 0 0 0 0 x25 x26 0 0 0 0 x35 x36 −x14 0 0 0 0 0 −x15 −x25 −x35 0 0 0 0 −x26 −x36 0 0 0         Tutte theorem: A graph G = V , E , with |V | even, has a perfect matching iff the determinant of the corresponding Tutte matrix is not identically zero. Proof: The determinant of AG equals π σπ n i=1 aiπ(i) where π are permutations of {1, 2, . . . , n} and σπ = 1 (σπ = −1) if π is a product of an even (odd) number of transpositions. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 26/50 TUTTE THEOREM - OBSERVATIONS Observation 1: For a permutation π, n i=1 aiπ(i) = 0 iff Gπ = {(i, π(i)), 1 ≤ i ≤ n} is a subgraph of G. Observation 2: Permutations π with at least one odd cycle do not contribute at all to the determinant of A, because to each such permutation π there is a permutation π such that n i=1 aiπ(i) = − n i=1 aiπ (i) Observation 3: It is sufficient to consider permutations π such that Gπ consists only of even cycles. Notation: Let permutation πr be obtained from π by reversing all cycles. Notation For a perfect matching E let tE denote the product of the a s corresponding to the edges of E . IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 27/50 TUTTE THEOREM - CASE ANALYSIS Case I: π = πr ⇒ Gπ consist of the cycles of length 2, π corresponds to a perfect matching E such that n i=1 aiπ(i) = (tE )2 . Case II: π = πr In this case both π and πr correspond to the union of two perfect matchings E, and E obtained by alternatively selecting edges within the cycles so that n i=1 aiπ(i) + n i=1 aiπr (i) = 2tE tE . Conclusion; det(AG ) = (tE1 + · · · + tEk )2 where Ei denotes i-th perfect matching. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 28/50 EXPLANATION Let us illustrate claims from the previous slide for a graph with 4 nodes: 1, 2, 3 ,4 Case 1. Let us have edges 1-2, 3-4 only and permutation π = πr : 1 → 2, 2 → 1, 3 → 4, 4 → 3 Perfect matching E is 1 − 2, 3 − 4, tE = a12a34 and 4 i=1 aiπ(i) = a12a21a34a43 = x12x12x34x34 = (tE )2 Case 2. Let us have edges 1-2, 2-3, 3-4, 4-1 and permutations: π : 1 → 2 → 3 → 4 → 1, πr : 1 → 4 → 3 → 2 → 1 tE = a12a34, tE = a41a23 4 i=1 aiπ(i) + 4 i=1 aiπr (i) = a12a23a34a41 + a12a23a34a41 = 2tE tE . IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 29/50 RANDOM SELECT Problem Given is a set S = {a1, . . . , an} of n > 0 different numbers, and 1 ≤ k ≤ n, find the k-th smallest number from S. A naive way to solve the problem is to sort at first S. This requires O(n lg n) comparisons. The following randomized algorithm RSELECT can do that in O(n) steps: Algorithm RSELECT(S, k)) 1 If n = 1 output a1. 2 Otherwise choose i ∈r {1, 2, . . . , n} and 1 compute S< = {b ∈ S | b < ai } S> = {b ∈ S | b > ai } 2 if |S<| ≥ k then RSELECT(S<, k); else if |S<| = k − 1 then output ai ; else RSELECT(S>, k − |S<| − 1) This is clearly a Las Vegas algorithm. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 30/50 CLASSIFICATION of RANDOMIZED OPTIMIZATION ALGORITHMS Classification of the randomized algorithms we had so far was based on the frequency of correct outputs and it is suited only for classification of algorithms for decision problems and for computation of functions, but not for optimization problems. In case of optimization problems, we do not take as the output the most frequent output from several runs, but the best output according to some optimization criterion. Moreover, in case of optimization problems our goal is not always to find an optimal solution. We are usually quite happy to find (and in a feasible way) an almost optimal solution - whose cost (quality) does not differ much from the cost (quality) of some optimal solution. All that means that a different approach to the classification of randomized approximation algorithms is needed. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 31/50 ILLUSTRATION If a randomized algorithm A computes an optimal solution for an input x only with probability at least 1 |x| , then it does not mean that A is not useful. Indeed, one can execute |x| independent runs of A on given input x, call that to be the Ax algorithm, and then take the best output of all of them. Let us explore now what is the probability of success of such an Ax algorithm. Probability of not computing an optimal solution in one run is at most 1 − 1 |x| and therefore the probability that Ax does not find an optimal solution in |x| independent runs is at most (1 − 1 |x| )|x| < 1 e and we have a constant probability 1 − 1 e of computing an optimal solution. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 32/50 DEFINITION of OPTIMIZATION PROBLEMS Definition: An optimization problem is a 6-tuple P = (ΣI , ΣO , L, M, cost, goal), where 1 ΣI is an input alphabet; 2 ΣO is an output alphabet; 3 L ⊆ Σ∗ I is the language of feasible inputs and any x ∈ L is called a problem instance of P; 4 M is a function from L to 2Σ∗ O , and for each x ∈ L, M(x) is the set of feasible solutions for x; 5 cost is a function: x∈L(M(x) × x) → R+ , called the cost function; 6 goal ∈ (minimum, maximum) is an objective. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 33/50 SOLVING an OPTIMIZATION PROBLEM Observe that to solve an optimization problem is more as computing a relation, than as computing a function, because we are usually happy with finding one of the optimal solutions or even with a solution quite close to an optimal one. A feasible solution α ∈ M(x) is called optimal for the problem instance x of P if cost(α, x) = goal{cost(β, x) | β ∈ M(x)}. An algorithm A is said to solve P if, for any x ∈ L, A(x) ∈ M(x); cost(A(x), x) = goal{cost(β, x) | β ∈ M(x)} If the goal is to find minimum (maximum) we talk about a minimization (maximization) problem. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 34/50 EXAMPLE - MINIMUM COST HAMILTONIAN CYCLE PROBLEM Input: A weighted complete graph (G, c), where G = (V , E), V = {v1, . . . , vn}, E ⊂ V × V and c : E → N is a cost function. Hamiltonian cycles: For any problem instance (G, c), let M(G, c) be the set of Hamiltonian cycles of G - each such cycle is represented by a sequence of vertices vi1 , vi2 , . . . , vin , vi1 , where i1, i2, . . . , in is a permutation of (1, 2, . . . , n) and for each j, (vij , vij+1 ) is an edge of G Costs of Hamiltonian cycles: For every Hamiltonian cycle H = (vi1 , vi2 , . . . , vin , vi1 ) ∈ M(G, c) cost((vi1 , vi2 , . . . , vin , vi1 ), (G, c)) = n j=1 c((vij , vi(j+1) mod n )). Goal: minimum - to find a Hamiltonian cycle with minimum cost. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 35/50 EXAMPLE v1 v2 v3 v4 v5 8 1 2 71 12 3 1 1 cost((v1, v2, v3, v4, v5, v1), (G, c)) = 19 cost((v1, v5, v3, v2, v4, v1), (G, c) = 5 IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 36/50 INTEGER LINEAR PROGRAMMING Given is a system of linear equations and a linear function over variables of this equations. The task is to find a solution to the system of equations such that the value of the linear function is minimized. More formally: Input An m × n matrix and two vectors A = {aij }i=m,j=n i,j=1 , b = (b1, . . . , bm)T , c = (c1, . . . , cn) with integer entries. Set of feasible solutions: M(A, b) = {X = (x1, . . . , xn)T | AX = b}. Cost of a solution: For X = (x1, . . . , xn) ∈ M(A, b) cost(X, c)) = c · X = n i=1 ci xi . Goal: minimum IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 37/50 QUANTIFICATION of “ALMOST OPTIMAL SOLUTIONS” 1/2 Many very important optimization problems are NP-hard and so we know only exponential time algorithms for finding optimal solutions. New idea is to jump from exponential to polynomial time by weakening the requirements - to be satisfied with almost optimal solutions. To quantize that tries the next definition. Definition Let P = (ΣI , ΣO , L, M, cost, goal) be an optimization problem. We say that A is a consistent algorithm for P if, for every x ∈ L, the output A(x) is a feasible solution for x - that is A(x) ∈ M(x). We say that an approximation algorithm A mapping each instance x of an optimization problem P to one of its feasible solutions has the ratio bound ρA(n) and the relative error bound εA(n) if max |x|=n cost(A(x)) cost(Opt(x)) , cost(Opt(x)) cost(A(x)) ≤ ρA(n) and max |x|=n |cost(A(x)) − cost(Opt(x))| max{cost(Opt(x)), cost(A(x)) ≤ εA(n) IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 38/50 Both definitions max |x|=n cost(A(x)) cost(Opt(x)) , cost(Opt(x)) cost(A(x)) ≤ ρA(n) and max |x|=n |cost(A(x)) − cost(Opt(x))| max{cost(Opt(x)), cost(A(x)) ≤ εA(n) are chosen to correspond to our intuition and to apply simultaneously to minimization and maximization problems. Both of these bounds compare an approximation solution with the optimal one, but in two different ways. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 39/50 QUANTIFICATION of “ALMOST OPTIMAL SOLUTIONS” 2/2 For any δ > 1 we say that A is a δ-approximation algorithm for P if, for every integer n, ρA(n) ≤ δ. The ratio bound is never less than one. An optimal algorithm has ratio bound 1. The larger is the best possible ratio bound of an approximation algorithm, the worse is the algorithm. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 40/50 Approximation algorithms for NP problems Two general problems concerning approximation of NP-complete problems are of special interest and importance. The constant relative error bound problem: Given an NP-complete optimization problem P with a cost of solution function c and an ε > 0, does there exist an approximation polynomial time algorithm for P with the relative error bound ε? The approximation scheme problem: Given an NP-complete problem P, does there exist for P with a cost of solution function c a polynomial time algorithm for designing, given an ε > 0 and an input instance x, an approximation for P and x with the relative error bound ε? IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 41/50 Approximation thresholds It is said that an algorithm A is an ε-approximation algorithm for an optimization problem P if ε is its relative error bound. The approximation threshold for P is the greatest lower bound of all ε > 0 such that there is a polynomial time ε-approximation algorithm for P. It can be shown that NP-complete problems can differ very much with respect to their approximation thresholds. Note that if an optimization problem P has an approximation threshold 0, this means that a (polynomial time) approximation arbitrarily close to the optimum is possible. Note also that if P has approximation threshold 1, this means that no universal (polynomial time) approximation method is possible. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 42/50 Examples Example 1 The approximation threshold for the optimization version of the KNAPSACK PROBLEM is 0. Example 2 The approximation threshold for the VERTEX COVER PROBLEM is ≤ 1 2 . Example 3 Unless P = NP, thee approximation threshold for the TRAVELING SALESMAN PROBLEM is 1. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 43/50 WHY TO APPLY RANDOMIZATION in DISCRETE OPTIMIZATIONS One of the main goals in the area of discrete optimization is to improve the approximation ratio. One tries to design randomized approximation algorithms that produce feasible solutions whose cost (quality) is not very far from the optimal cost with high probability. In the analysis of randomized algorithms we consider therefore the approximation ratio as a random variable and the aim is then either 1 to estimate the expected value, E(Ratio), or 2 to guarantee that a certain approximation ratio is achieved with probability at least 1 2 . These two different aims lead to two ways randomized approximation algorithms are defined. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 44/50 Definition 1 Let P = (ΣI , ΣO , L, M, cost, goal) be an optimization problem. For any δ > 1, a randomized algorithm A is called a randomized E[δ]-approximation algorithm for P if 1 Prob(A(x) ∈ M(x)) = 1, and 2 E[RatioA(x) ≤ δ] ≥ 1 2 . for every x ∈ L. Definition Let P = (ΣI , ΣO , L, M, cost, goal) be an optimization problem.For any δ > 1, a randomized algorithm A is called a randomized δ-approximation algorithm for P if 1 Prob(A(x) ∈ M(x)) = 1, and 2 Prob(RatioA(x) ≤ δ) ≥ 1 2 . for every x ∈ L. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 45/50 Online algorithms Another area in which randomization plays important role are online algorithms. In practice the following problems are often of importance. One always obtains only a part of the input that has to be processed immediately. Once this is done one obtains another part of the input and has to process it again immediately, and so on - the input can be infinitely long. Such problems are called online problems and algorithms to solve them are called online algorithms. Example Scheduling problem - immediate assigning of resources for requests coming one after another. Key question. How good can an online algorithm (that does not know the future) be in comparison to an algorithm that knows the whole input (the future) from the beginning? IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 46/50 Evaluation of online algorithms Let P == (Σi , Σ0, L, M, cost, goal) be an optimization problem that can be viewed as an online problem1 An algorithm A is an online algorithm for P if, for every input x = x1x2 . . . xn ∈ L the following conditions are satisfied: 1 For all i ∈ {1, . . . , n} x1x2 . . . xi is a feasible input. 2 A(x) ∈ M(x), i.e. A always computes a feasible solution. 3 For all i ∈ {1, . . . , n}, A(x1x2 . . . xi ) is a part of A(x), i.e. the decisions made for the prefix x1x2 . . . xi of x cannot be changed any more. For every input x ∈ L, the competitive ratio compA(x) of A on x is the number compA(x) = max OptP (x) costA(x) , costA(x) OptP (x) where OptP (x) denotes the cost of an optimal solution for the instance x of the problem P. Let δ ≥ 1. We say that A is a δ-competitive algorithm for P if compA(x) ≤ δ for all x ∈ L. Let δ ≥ 1 be a real. We say that an online problem P is δ-hard if there does not exist any d-competitive online algorithm for P with d < δ. 1 An optimization problem can be viewed as an online problem when each prefix y of every input x can be viewed also as a problem instance, and one is required to provide a solution for y that has to remain unchanged as a part of the solution for the whole input x. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 47/50 RANDOMNESS EXTRACTORS Extractors are algorithms that produce from any long and weakly-random bit-string a shorter, but more random, bit-string. In other words, an extractor is a mapping which, when applied to high-entropy source generates a shorter yet uniformly distributed output. In a more general approach an extractor is an algorithm that converts a long weakly random source and a truly random short seed into a uniformly distributed random output (that is longer than the seed and shorter than the source). An extractor is a certain kind of pseudorandom generator. No extractor is currently know that has been proven to work when applied to any type of high-entropy source. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 48/50 von NEUMANN EXTRACTOR It is an extractor that keeps taking successive pairs of consecutive bits (non-overlapping) from the input stream and if the two bits are the same, no output is generated; if they are different the first of them is outputted. For example the input sequence 10111100001111000011100110010101 is transformed to the sequence 1101000 The von Neumann extractor can be shown to produce a uniform output, even if the distribution of the input bits is not uniform, so long as each bit has the same probability of being one and there is no correlation between successive bits. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 49/50 EXTRACTORS - FORMAL DEFINITION Definition A (k, ε)-extractor Ext is a mapping Ext : {0, 1}n × {0, 1}d → {0, 1}m such that for every distribution X on {0, 1}n with H∞(X) ≥ k and the seed s, the distribution Ext(X, s) is ε-close to the uniform distribution on {0, 1}m . The aim is to have n > m and d << m. By a probabilistic method to be discussed later one can show that there exists a (k, ε) extractor for many k and ε. Note H∞ stands for so-called min-entropy, which is a measure of the amount of randomness in the worst case. IV054 1. 2. Types and Basic Design Methods for Randomized Algorithms 50/50