Part V Games and design of randomized algorithms Chapter 5. MOMENTS AND DEVIATIONS In this chapter we present several methods that make use of the second degree moments, variance and standard deviation, for solving various problems related to randomized algorithms. We first show how so called second moment method can be used to determine thresholds of some events. Threshold of a sequence of events Ep,n, that depends on a probability p and a size-parameter n,, is a number tE,n such that if p < tE,n, then the probability of the event Ep,n goes to 0 (if n → ∞), and if p > tE,n, then the probability of Ep,n goes to 1. We will then discuss, in various details, in this chapter also three important problems: Occupancy (Balls-into-Bins) problem, Stable marriage problem and Coupon selection problem that have many applications. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 2/59 VARIANCE of the SUM of RANDOM VARIABLES If Xi , i = 1, 2, . . . , n, are random variables, and X = n i=1 Xi , then VAR[X] = E[X2 ] − (E[X])2 (1) = n i=1 E[X2 i ] + i=j E[Xi Xj ] − n i=1 (E[Xi ])2 − i=j E[Xi ]E[Xj ] (2) = n i=1 (E[X2 i ] − (E[Xi ])2 ) + i=j (E[Xi Xj ] − E[Xi ]E[Xj ]) (3) = n i=1 VAR[Xi ] + i=j (E[Xi Xj ] − E[Xi ]E[Xj ]) (4) Covariance of Xi and Xj , denoted by Cov(Xi , Xj ) is defined by: E[Xi Xj ] − E[Xi ]E[Xj ] = E[(Xi − E[Xi ])(Xj − E[Xj ])] prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 3/59 THRESHOLDS – BASICS Many interesting/important graph properties have threshold functions in the family of random graphs Gn,p. Definition: The family of random graphs Gn,p is defined to be the family of all graphs on n nodes where each edge is chosen with probability p. Threshold for a property P of graphs from Gn,p is such a value p∗ that if p < p∗ , then the probability that a graph sampled from Gn,p has the property P goes to 0, and if p > p∗ , then such a probability goes to 1 (for n → ∞). Useful fact: Let X be a non-negative random variable with E[X] = µ and VAR[X] = σ2 . By Chebyshev’s inequality Pr[|X − µ| ≥ λσ] ≤ 1 λ2 . By choosing λ = µ σ , we get Pr[X = 0] ≤ Pr[(X = 0) ∪ (X ≥ 2µ)] = Pr[|X − µ| ≥ µ] ≤ σ2 µ2 = VAR[X] E[X]2 . (5) Therefore, if VAR[X] = o(E[X]2 ), we get Pr[X = 0] → 0. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 4/59 CONDITIONS WHEN PROBABILITY GOES TO 0 Let E1, . . . , En be events having the same probability p and let Xi be the indicator variable for Ei . Finally, let X = n i=1 Xi . Denote by i j the case that events Ei and Ej are dependent and i = j. By definition VAR[X] = n i=1 VAR[Xi ] + i j Cov(Xi , Xj ), where Cov(Xi , Xj ) = E[Xi Xj ] − E[Xi ]E[Xj ] ≤ E[Xi Xj ] = Pr[Xi ∩ Xj ]. Since Xi are Bernoulli random variables, it holds VAR[Xi ] = p(1 − p) ≤ p = E[Xi ] =⇒ VAR[X] ≤ n i=1 E[Xi ] + i j Pr[Xi ∩ Xj ] By (5), and using the notation ∆∗ = i j Pr[Xi ∩ Xj ], we get Pr[X = 0] ≤ VAR[X] E[X]2 ≤ E[X] + ∆∗ E[X]2 = 1 E[X] + ∆∗ E[X]2 , Therefore, if E[X] → ∞ and ∆∗ = o(E[X]2 ), then Pr[X = 0] → 0. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 5/59 THRESHOLD FUNCTION for CLIQUE 1/2 For a graph G let SLC(G) be the size of the largest clique of G. Theorem: Event SLC(Gn,p) ≥ 4 has a threshold (function) p∗ = n−2/3 . Proof: Let us consider a random Gn,p graph. For any set S of 4 vertices of Gn,p let AS be the event that S forms a clique in G and let XS be the indicator variable of AS . Clearly, Pr[AS ] = p6 for any set S of 4 vertices. If X = S,|S|=4 XS , then E[X] = n 4 · p6 ≈ n4 p6 24 and, by Markov’s inequality Pr[X ≥ 1] ≤ E[X] 1 Therefore, (1): if p n−2/3 , then E[X] = o(1). Hence Pr[SLC(Gn,p) ≥ 4] = Pr[X ≥ 1] → 0. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 6/59 THRESHOLD FUNCTION for CLIQUE 2/2 Let now (2): p n−2/3 . Firstly, note that then E[X] ≈ n4 p6 24 → ∞. Secondly, observe that two cliques are dependent iff they intersect in at least two vertices. Let now S and T be any two sets of four vertices: 1 If |S ∩ T| = 2, then Pr[AT |AS ] = p5 . (Observe that for any fixed set S there are O(n2 ) such sets T.) 2 If |S ∩ T| = 3, then Pr[AT |AS ] = p3 . (Observe that for any fixed set S there are O(n) such sets T.) Therefore, for ∆∗ defined on previous slide it holds: ∆∗ = T S Pr[At ∩ AS ] ≤ T S Pr[AT |AS ] ≈ np3 + n2 p5 . Hence, if p n−2/3 , then E[X] → ∞, ∆∗ = o(E[X]). Therefore, Pr[SLC(Gn,p) < 4] = Pr[X = 0] → 0. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 7/59 OCCUPANCY (BALLS-INTO-BINS) PROBLEM PROBLEM: Each of m distinguishable objects (balls) is randomly and independently assigned to one of n distinct classes (bins/boxes). How does the distribution of balls into bins look like? Subproblem 1: How many of the bins will be empty? For a given k what is the probability that k boxes will be empty? Subproblem 2: What is the maximum number of balls in a box? (What is the probability pk that maximum number of balls in some box is (at least) k?) Subproblem 3: What is the expected number ek of boxes with k balls for a given k? prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 8/59 Subproblem 4: What is probability that all balls land in different boxes? (For n = 365 and m < n we get so-called birthday problem) Surprisingly, these simple probability problems are at the core of the analyses of many randomized algorithms. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 9/59 Remainder I. - Mealy’s inequality For arbitrary events ξ1, ξ2, . . . ξn Pr n i=1 ξi ≤ n i=1 Pr(ξi ) Usefulness of this inequality lies in the fact that it makes no assumption about dependencies among events! Therefore, this inequality allows to analyse phenomena with very complicated interactions (without revealing these interactions). prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 10/59 Remainder III. - COMBINATORIAL INEQUALITIES– I. Let us now present several combinatorial inequalities that are often used at the analysis of algorithms. n k = n n − k = n! k!(n − k)! ex = 1 + x + x2 2! + x3 3! + . . . e−x > 1 − x ln(1 + x) = x − x2 2 + x3 3 − x4 4 + . . . n k ≤ nk k! , n k ≤ ne k k , n k k ≤ n k prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 11/59 For large n n k ∼ nk k! . prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 12/59 Remainder IV. - COMBINATORIAL INEQUALITIES – II et ≥ 1 + t if t ∈ R. If n ≥ 1 and |t| ≤ n, then et 1 − t2 n ≤ 1 + t n n ≤ et . For all t, n ∈ R+ , it holds (1 + t n )n ≤ et ≤ (1 + t n )n+t/2 . nth Harmonic number Hn is defined as follows Hn = n i=1 1 i = ln n + θ(1). prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 13/59 BASIC RESULT for OCCUPANCY PROBLEM Case: n = m Notation: Xj – the number of balls in the j th bin. E[Xi ] = 1 - this can be shown similarly as in case of the sailor problem. Notation: ξj (k) – the event that bin j has k or more balls in it. Analysis of ξ1(k) The probability that bin 1 receives exactly i balls is n i 1 n i 1 − 1 n n−i ≤ n i 1 n i ≤ ne i i 1 n i = e i i Therefore Pr[ξ1(k)] ≤ n i=k e i i ≤ e k k 1 + e k + e k 2 + . . . and for k = k∗ = e ln n ln ln n Pr [ξ1(k∗ )] ≤ e k∗ k∗ 1 1 − e k∗ ≤ n−2 . prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 14/59 BASIC COROLLARIES Problem: What is the probability that at least one bin has at least k∗ balls in it? Solution: It holds Pr n i=1 ξi (k∗ ) ≤ n i=1 Pr[ξi (k∗ )] ≤ 1 n . Corollary With the probability at least 1 − 1 n no bin has more than k∗ = e ln n ln ln n balls in it. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 15/59 COMPLEXITY of SORTING It is well known that for sorting n elements: Worst case complexity is O(n lg n); Average case complexity is O(n lg n). Both bounds are with respect to the number of comparisons. Can we do better? In some reasonable sense? In some interesting cases? prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 16/59 Reminder - Binomial distribution Let now values of a random variable Y be the number of successes of trials with success probability p in n trials. Then Pr(Y = k) = n k pk qn−k Such a probability distribution is called the binomial distribution and it holds EY = np VY = npq G(z) = (q + pz)n and also EY 2 = n(n − 1)p2 + np prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 17/59 BUCKET SORT Bucket sort is a deterministic sorting algorithm that, under certain probabilistic assumptions on inputs, sorts numbers in the expected linear time. Suppose that we have a set of n = 2m integers that are to be sorted and they are chosen independently and uniformly at random from the interval [0, 2k ) for a k ≥ m. Using Bucket sort we can sort such numbers in the expected time O(n). prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 18/59 BUCKET SORT ALGORITHM - STAGE 1 Stage 1. All to be sorted numbers will go to n buckets in such a way that all numbers whose first m bits represent a number j will go to the j-th bucket. As a consequence, when j < l all elements in the j-th bucket comes before all elements in the l-bucket once all elements are sorted. If we assume that each element can be put in the appropriate bucket in constant time, the above stage requires O(n) time. Because of the assumption that the elements to be sorted are chosen uniformly, the number of elements that land uniformly in a bucket follows the binomial distribution B(n, 1 n ) introduced in Chapter 3. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 19/59 BUCKET SORT ALGORITHM - STAGE 2 Sort each bucket using a standard quadratic time algorithm and concatenate all sorted lists Analysis; If Xi is the number of elements in the ith bucket then they can be sorted in time cX2 i for some constant c. The expected time to do this sorting is therefore E n j=1 cX2 j = c n j=1 E[X2 j ] = cnE[X2 1 ] Since Xi is a binomial random variable B(n, 1 n ), by using the result from Chapter 3 we get E[X2 1 ] = n(n − 1) n2 + 1 = 2 − 1 n < 2 and therefore the expected time of the bucket sort is at most 2cn. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 20/59 BIRTHDAY PARADOX Let us assume that the birthday of each person in a room is a random day chosen uniformly and independently from a 365-day year. If there are k such people in a room than probability that each of them has birthday in a different day is (1 − 1 365 )(1 − 2 365 )(1 − 3 365 ) . . . (1 − k − 1 365 ) = k−1 j=1 (1 − j 365 ) what equals to k! 365 k 365−k . Using the inequality 1 − j n ≈ e−j/n for j small - comparing to n - we have k−1 j=1 (1 − j n ) ≈ k−1 j=1 e−j/n = e− k−1 j=1 j n = e−k(k−1)/2n ≈ e−k2 /2n Hence the probability that k people all have different birthday from a set of n possible birthdays is 1 2 is approximately given by equation k2 2n = ln 2 what gives, for the case n = 365, k = 22.49. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 21/59 VARIATIONS on BIRTHDAY PARADOX It is well known that if we have 23 (30) [50] people in one room, then the probability that two of them have the same birthday is more than 50.7%(70.6%)[97%] -this is so called Birthday Paradox. In the case we have 57 [100] people in the room the probability is 99% [99.99997%] More generally, if we have n objects and r people each choosing one object (and several of them can choose the same object), then if r ≈ 1.177 √ n (r ≈ √ 2λ), then probability that two people choose the same object is 50% (1 − eλ )%. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 22/59 Birthday paradox - graph - I. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 23/59 Birthday paradox - graph - II. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 24/59 ANOTHER VERSION of THE BIRTHDAY PARADOX Let us have n objects and two groups of r people. If r ≈ √ λn then the probability that someone from one group chooses the same object as someone from the other group is 1 − e−λ. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 25/59 STABLE MARRIAGE PROBLEM - INFORMAL FORMULATION Given is n men and n women and each of them has ranked all members of the opposite sex with a unique number between 1 and n in order to express of his/her preferences. Task: Marry all men and women together in such a way that there are no two (unsatisfied) people of the opposite sex who would both rather have each other than their current partners. If there is a no dissatisfied couple in a (group) marriage we consider the (group) marriage as stable. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 26/59 THE STABLE MARRIAGE PROBLEM Consider a society of n men A, B, C, . . . and n women a, b, c, . . . A marriage is 1-1 correspondence between men and women of that society. Assume that each person has a preference list of the members of the opposite sex, organised in a decreasing order of desirability. Example A : abcd a : ABCD B : bacd b : DCBA C : adcb c : ABCD D : dcab d : CDAB A marriage is said to be unstable if there exist two married couples X − x, Y − y such that X desires y more than x y desires X more than Y Such a pair (X, y) is called dissatisfied. The task is to find a stable marriage. (At least one always exist!) Example of an unstable marriage: A − a, B − b, C − c, D − d. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 27/59 COMMENTS The stable marriage problem, and its variations, form one of the most famous and important groups of algorithmic problems with a variety of interesting and important applications. A related book: Donald E. Knuth: Stable marriage and its relation to other combinatorial problems: an introduction to the mathematical analysis of algorithms, CRM Proceedings and Lecture Notes, Algorithms to deal with this type of problems are used, for example: To assign graduates of medical schools in North America (about 30 000) to hospitals; prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 28/59 EXISTENCE and OPTIMALITY of SOLUTIONS NOTE 1 We will show later that a stable marriage always exists. NOTE 2 A stable marriage assignment does not need to be optimal for all. EXAMPLE: Let us have three men M1, M2 and M3 and three women W1, W2 and W3 with preferences: M1 : W2W1W3, M2 : W3W2W1, M3 : W1W3W2 W1 : M2M1M3, W2 : M3M2M1, W3 : M1M3M2 There are three stable solutions: All men get their first choice and women the third one: M1W2, M2W3, M3W1 All get their second choice: M1W1, M2W2, M3W3 Women get their first choice and men the third one: M1W3, M2W1, M3W2 prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 29/59 A naive, but bad, randomized algorithm 1 Start with some marriage of all. 2 until marriage is stable do randomly choose a dissatisfied pair, marry them and also their partners together Algorithm is bad because a loop can occur. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 30/59 EXAMPLE 1 Let us have the followig preferences: M1 : W3W2W4W1 M2 : W2W1W3W4 M3 : W2W4W1W3 M4 : W3W1W4W2 and W1 : M1M2M4M3 W2 : M3M1M4M2 W3 : M3M2M4M1 W4 : M2M1M3M4 Successful developments of marriages: M1W1 M2W2 M3W3 M4W4 − −unstable M1W2 M2W1 M3W3 M4W4 − −unstable M1W3 M2W1 M3W2 M4W4 − −unstable M1W4 M2W1 M3W2 M4W3 − −!stable! prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 31/59 EXAMPLE 2 For choices: M1 : W2W1W3 M2 : arbitrary M3 : W1W2W3 and W1 : M1M3M2 W2 : M3M1M2 W3 : arbitrary we have the following cyclic development of marriages M1W1 M2W2 M3W3 M1W2 M2W1 M3W3 M1W3 M2W1 M3W2 M1W3 M2W2 M3W1 M1W1 M2W2 M3W3 prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 32/59 EXAMPLE 3 For choices M1 : W1W2W3W4W5 M2 : W2W3W4W5W1 M3 : W3W4W5W1W2 M4 : W4W5W1W2W3 M5 : W5W1W2W3W4W5 and W1 : M2M3M4M5M1 W2 : M3M4M5M1M2 W3 : M4M5M1M2M3 W4 : M5M1M2M3M4 W5 : M1M2M3M4M5 we have exactly 5 stable marriages M1W1 M2W2 M3W3 M4W4 M5W5 M1W2 M2W3 M3W4 M4W5 M5W1 M1W3 M2W4 M3W5 M4W1 M5W2 M1W4 M2W5 M4W1 M4W2 M5W3 M1W5 M2W1 M4W2 M5W3 M5W4 In the x-th of the above marriages each man is married with his x-th choice. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 33/59 PROPOSAL ALGORITHMS The naive approach – to start with an arbitrary marriage and to try to stabilize it by pairing up dissatisfied couples – does not always work. MEN PROPOSAL ALGORITHM -“man proposes, woman disposes” Assume that all men are numbered somehow. At any step of the algorithm (due to Gale-Shapley), there will be a partial marriage, and the lowest-number unmarried man M proposes ”marriage” to the most desirable women W on his list who has not rejected him yet. The woman W then decides whether to accept his proposal or to reject it. The women W accepts the proposal if she is not yet married or she likes M more than her current partner. The algorithm repeats the process and terminates after every person has been married. It is a linear time algorithm, concerning the worst case complexity. It is easy to see that the process terminates and resulting marriage is stable. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 34/59 CORRECTNESS of the ALGORITHM Everyone gets married Observe that once a women gets married she will stay married (though she can change her partners - even several times). It cannot be the case that at the end there is a man and a woman who are not married. Indeed, the men would have proposed her marriage at some point and being unmarried she could not refused him. Final marriage is stable Indeed, let at the end M be a men and W a women who are married, but not to each other and they are dissatisfied. If M prefers W over his current partner, he must have proposed marriage to W before he did that to his current partner. If W accepted his proposal yet is not married with him at the end, she must have changed him for someone she likes more and therefore she cannot like M more than her current partner. If W rejected his proposal, she was already married with someone she liked more than M. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 35/59 COMPLEXITY of the PROPOSAL ALGORITHM At each proposal step one women is eliminated from a man list. Total number of proposals is therefore at most n2 . The result of the men-proposal algorithm does not depend on the order men are chosen to make their proposals. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 36/59 PROPERTIES of the GALE-SHAPLEY-ALGORITHM Gale-Shapley marriage is men-optimal and women-pessimal. To see that consider the following definition of a feasible marriage. A marriage between a man A and a woman B is called feasible if there exists a stable pairing (marriage) in which A and B are married. It is said that a marriage is men-optimal if every man is married with his highest ranked feasible partner. It is said that a marriage is women-pessimal if each woman is married with her lowest ranked feasible partner. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 37/59 SOME APPLICATIONS National residency matching program. Dental residencies and medical specialities in the USA, Canada and parts of UK National university entrance exam in Iran Placement of Canadian lawyers in Ontario and Alberta Matching of new reform rabbis to their first congregation Assignment of students to high-schools in NYC prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 38/59 STABLE HUSBANDS A stable husband of a woman, with respect to a given rankings, is a man she can be married with in a stable marriage. D. E. Knuth and et al. showed that In case of n men and n women, any woman has at least (1 2 − ) ln n and at most (1 + ) ln n different stable husbands in the set of all Gale-Shapley stable matchings defined by these rankings, with probability approaching 1 as n → ∞ , if is any positive constant. There is an algorithm that outputs all stable husbands of a given women. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 39/59 RANDOMIZED VERSIONS of the PROPOSAL ALGORITHM Next goal: The average-case analysis of the proposal algorithm under the assumptions: men’s lists are chosen independently and randomly, women’s lists can be arbitrary, but are fixed in advance. Let Tp be the random variable that denote the number of proposals made during the execution of the Proposal algorithm – what is proportional to the overall time of algorithm. Distribution of Tp seems to be very difficult to determine or even to analyse. Our goal is to show that the expected value of the number of proposals is about O(n lg n). prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 40/59 Game “CLOCK SOLITAIRE” We illustrate first, on a simple card game, a simple technique that allows to analyse randomized algorithms with seemingly complex behaviour. 1. Game ”Clock Solitaire” A standard deck of 52 cards is randomly shuffled and then divided into 13 piles (columns) of 4 cards each. Each pile is arbitrarily labeled with a distinct symbol from {A, 2, . . . , 10, J, Q, K} A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K 4 3 4 5 6 7 8 9 10 J Q K A 2 4 5 6 7 8 9 10 J Q K A 3 3 5 6 7 8 9 10 J Q K A 2 2 On the first move a card is drawn from the pile labeled K. At each subsequent move, a card is drawn from the pile whose label is the face value of the card at the previous move. The game ends, if an attempt is made to draw a card from an empty pile. We win the game if, on termination, all 52 cards have been drawn. In all other cases we lose the game. What is the probability to win the game? prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 41/59 A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K 2 3 4 5 6 7 8 9 10 J Q K A 3 4 5 6 7 8 9 10 J Q K A 2 4 5 6 7 8 9 10 J Q K A 2 3 A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K 2 3 4 5 6 7 8 9 10 J Q K A 3 4 5 6 7 8 9 10 J Q K A 2 4 5 6 7 8 9 10 J Q K A 2 3 A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K 2 3 4 5 6 7 8 9 10 J Q K A 3 4 5 6 7 8 9 10 J Q K A 2 4 5 6 7 8 9 10 J Q K A 2 3 prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 42/59 PRINCIPLE od ”DEFERRED RANDOM DECISIONS” 1. Game ”Clock Solitaire” – repetition A standard deck of 52 cards is randomly shuffled and then divided into 13 piles of 4 cards each. Each pile is arbitrarily labeled with a distinct symbol from {A, 2, . . . , 10, J, Q, K} At each subsequent move, a card is drawn from the pile whose label is the face value of the card at the previous move. The game ends, if an attempt is made to draw a card from an empty pile. Observe that our game always terminates in an attempt to draw a card from the K-pile. (Why?) ANALYSIS of ALGORITHM How to choose the probability space? Let the random choices unfold with progress of the game: that is at any step each of the yet unseen cards is likely to appear. Thus, the process of playing this game is equivalent to the process of repeatedly drawing cards uniformly and randomly from the deck of 52 cards. A winning game corresponds to the situation where the first 51 cards drawn in this fashion contain exactly 3 kings! Probability of winning our game is therefore, clearly, 1/13. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 43/59 The idea of the Principle of Deferred Decisions is not to assume that the entire set of random choices is made in advance, rather, that at each step of the algorithm we fix only that random choice that needs to be revealed at that step. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 44/59 ANALYSIS of RANDOMIZED VERSION of PROPOSAL ALGORITHM 1/2 Principle of deferred decision: Do not assume that entire set of random choices is made in advance. Rather, at each step of the process fix only that random choices that must be revealed at that step to the algorithm. An application to the Proposal Algorithm: We will remove dependencies by do not assuming that men have chosen their preference lists in advance. We will assume that each time a man has to make a proposal he picks a random woman from the list od women not already proposed by him, and proceeds to propose her. (Clearly this is equivalent to choosing a random preference list prior the execution of the algorithm.) The only dependency that remains is that the random choice of a women at any step depends on the proposals made so far by the current proposer. To eliminate the above dependency let us change the algorithm. Each time a man makes proposal he chooses randomly a woman from the set of all women. Call this new algorithm Amnesiac Algorithm. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 45/59 ANALYSIS of RANDOMIZED VERSION of PROPOSAL ALGORITHM 2/2 Let TA(TP) be the number of proposal made by the Amnesiac (Proposal) algorithm. It is obvious that for all m Pr[ TA > m] ≥ Pr[ TP ≥ m] and therefore an upper bound on TA is an upper bound on TP . The advantage of analyzing TA is that we need only to count the total number of proposals made – without regard to the name of the proposer at each stage. New algorithm terminates with a stable marriage once each woman has received at least one proposal (for a “marriage”). To task to determine a good upper bound of TA is a special case of the task to determine such a bound for so-called Coupon Selection Problem discussed next. At the end we will get: Theorem For any constant c ∈ and m = n ln n + cn lim n→∞ Pr[ TA > m ] = 1 − e−e−c prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 46/59 COUPON SELECTION PROBLEM There are n types of coupons and at each time a coupon is chosen at random. The task is to determine for each m ≥ n the probability of having collected at least one of each of the n types of coupons in m trials. Elementary analysis Let X be a random variable the value of which is the number of trials required to collect at least one of each type of coupons. Let C1, . . . , CX denote a sequence of trials. The i-th trial Ci will be called success if the coupon selected in the trial Ci was not drawn in any of the first i − 1 selections. Sequence C1, . . . , CX will be divided into epochs. i-th epoch begins with the trial following the i-th success and ends with the trial which is (i + 1)-th success. Let Xi , 0 ≤ i < n, be the number of trials in the i-th epoch. Then X = n−1 i=0 Xi prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 47/59 If pi is the probability of the success on any trial of the i-th epoch then pi = n − i n . Random variable Xi is geometrically distributed, with the parameter pi , and therefore its average value is E[Xi ] = 1 pi = n n−i and its variance V [Xi ] = σ2 Xi = 1−pi p2 i = ni (n−i)2 . By the linearity of expectations we have: E[X] = E n−1 i=0 Xi = n−1 i=0 E[Xi ] = n−1 i=0 n n − i = n n i=1 1 i = nHn. Since Xi are independent σ2 X = n−1 i=0 σ2 Xi = n−1 i=0 ni (n − i)2 = n−1 i=0 n(n − i) i2 = n2 n i=1 1 i2 − nHn Since limn→∞ n i=1 1 i2 = π2 6 we have lim n→∞ σ2 X n2 = π2 6 prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 48/59 We show now that X unlikely deviates much from expectation Let εr i denote the event that a coupon of type i is not collected in the first r trials. Pr[εr i ] = (1 − 1 n )r ≤ e− r n = n−β for r = βn ln n Therefore, for r = βn ln n, we get Pr[X > r] = Pr [∪n i=1εr i ] ≤ n i=1 Pr[εr i ] ≤ n i=1 n−β = n−(β−1) Next aim: To study the probability that X deviates from its expectation nHn by the amount cn for any real c. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 49/59 A TECHNICAL LEMMA and MAIN THEOREM Lemma Let c be a real number and m = n ln n + cn for a positive integer n. Then, for any fixed k it holds lim n→∞ n k (1 − k n )m = e−ck k! . prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 50/59 MAIN THEOREM 1/4 Theorem Let the random variable X denote the number of trials for collecting each of the n types of coupons. Then for any c ∈ R and m = n ln n + cn lim n→∞ Pr[X > m] = 1 − e−e−c Proof Consider the event {X > m} = n i=1 εm i . By the principle of the Inclusion-Exclusion Pr n i=1 εm i = n k=1 (−1)k+1 Pn k (∗) where Pn k = 1≤i1<··· 0 there exists k∗ > 0 such that for any k > k∗ |Sk − f (c)| < ε. Since limn→∞ Pn k = Pk , we have limn→∞ Sn k = Sk . Equivalently, for all ε > 0 and all k, for all sufficiently large. n |Sn k − Sk | < ε Thus, for all ε > 0 any fixed k > k∗ , and n sufficiently large |Sn k − Sk | < ε, |Sk − f (c)| < ε =⇒ |Sn k − f (c)| = |Sn k − Sk | + |Sk − f (c)| < 2ε and |Sn 2k − Sn 2k+1| < 4ε. As a consequence Pr n i=1 εm i − f (c) < 4ε and therefore lim n→∞ Pr n i=1 εm i = f (c) = 1 − e−e−c prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 53/59 MAIN THEOREM 4/4 what implies lim n→∞ Pr[X > n(ln n + c)] = 1 − e−e−c Implications With extremely high probability, the number of trials, for collecting all n coupon types, lies in a small interval centered about its expected value. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 54/59 A SUMMARY of the ANALYSIS of STABLE MARRIAGE PROBLEM In case of the stable marriage problem of n men and women we have The worst case complexity (of the number of proposals) in n2 , The average case complexity is O(n lg n). Deviation is small from the expected case. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 55/59 APPENDIX prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 56/59 SIMILAR PROBLEMS Generalised stable marriage problem A man (woman) may not be willing to marry some partners from the opposite sex and may prefer to stay single. Stable roommate problem is similar to the stable marriage problem, but all participants belong to a single pool (Group). Hospitals-students(medical) problem This differs from the stable marriage problem that a women [hospital] can accept ”proposals” from more than one man [student]. Hospital-students problems with couples Similar problem as the above one, but among students can be couples that have to be assigned either to the same hospital or to a specific pair of hospitals chosen by couples. prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 57/59 A COUPON SELECTION PROBLEM APPLICATION Let packets be sent in a stream from a source node to a destination node along a fixed path of routers. Let us assume that the destination node would like/need to know which routers the stream of packets has passed through. Let us assume that each packet can store, randomly chosen, the name of one of the routers it goes through. Determining all the routers on the path is like a coupon collector’s problem. This means that if there are n routters on the path then the expected number of packets that need to be sent so the destination node knows all routers is nH(n). prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 58/59 EXERCISES 1 What is larger, eπ or πe , for the basis e of natural logarithms prof. Jozef Gruska IV054 5. Games and design of randomized algorithms 59/59