CZ.1.07/2.2.00/28.0041 Centrum interaktivn´ıch a multimedi´aln´ıch studijn´ıch opor pro inovaci v´yuky a efektivn´ı uˇcen´ı Prologue You should spent most of your time thinking about what you should think about most of your time. IV054 0. 2/66 RANDOMIZED ALGORITHMS AND PROTOCOLS - 2020 RANDOMIZED ALGORITHMS AND PROTOCOLS - 2020 Prof. Jozef Gruska, DrSc Wednesday, 10.00-11.40, B410 WEB PAGE of the LECTURE http://www.fi.muni.cz/usr/gruska/random20 FINAL EXAM: You need to answer four questions out of five given to you. CREDIT (ZAPOˇCET): You need to answer three questions out of five given to you. EXERCISES/TUTORIALS EXERCISES/TUTORIALS: Thursdays 14.00-15.40, C525 TEACHER: RNDr. Matej Pivoluˇska PhD Language English NOTE: Exercises/tutorials are not obligatory IV054 0. 5/66 CONTENTS - preliminary 1 Basic concepts and examples of randomized algorithms 2 Types and basic design methods for randomized algorithms 3 Basics of probability theory 4 Simple methods for design of randomized algorithms 5 Games theory and analysis of randomized algorithms 6 Basic techniques I: moments and deviations 7 Basic techniques II: tail probabilities inequalities 8 Probabilistic method I: 9 Markov chains - random walks 10 Algebraic techniques - fingerprinting 11 Fooling the adversary - examples 12 Randomized cryptographic protocols 13 Randomized proofs 14 Probabilistic method II: 15 Quantum algorithms IV054 0. 6/66 LITERATURE R. Motwami, P. Raghavan: Randomized algorithms, Cambridge University Press, UK, 1995 J. Gruska: Foundations of computing, International Thompson Computer Press, USA. 715 pages, 1997 J. Hromkoviˇc: Design and analysis of randomized algorithms, Springer, 275 pages, 2005 N. Alon, J. H. Spencer: The probabilistic method, Willey-Interscience, 2008 Part I Basic Techniques I: Moments and Deviations Chapter 5. BASIC TECHNIQUES: MOMENTS and DEVIATIONS In this chapter we present several methods that make use of the second degree moments, variance and standard deviation, for solving various problems related to randomized algorithms. We will discuss, in various details, in this chapter also three important problems: Occupancy (Balls-into-Bins) problem, Stable marriage problemand Coupon selection problem that have many applications. IV054 1. Basic Techniques I: Moments and Deviations 9/66 OCCUPANCY (BALLS-INTO-BINS) PROBLEM PROBLEM: Each of m distinguishable objects (balls) is randomly and independently assigned to one of n distinct classes (bins/boxes). How does the distribution of balls into boxes look like after k assignments? Subproblem 1: How many of the boxes will be empty? What is the probability that, for any given k, k boxes will be empty? Subproblem 2: What is the maximum number of balls in a box? (What is the probability pk, for a given k, that maximum number of balls in some box is k?) Subproblem 3: What is, for a given k, the expected number ek of boxes with k balls in? IV054 1. Basic Techniques I: Moments and Deviations 10/66 Subproblem 4: What is probability that all balls land in different boxes? (For n = 365 and m < n we get so-called birthday problem) Surprisingly, these simple probability problems are at the core of the analyses of many randomized algorithms. IV054 1. Basic Techniques I: Moments and Deviations 11/66 Remainder I. - Mealy’s inequality For arbitrary events ξ1, ξ2, . . . ξn Pr n i=1 ξi ≤ n i=1 Pr(ξi ) Usefulness of this inequality lies in the fact that it makes no assumption about dependencies among events! Therefore, this inequality allows to analyse phenomena with very complicated interactions (without revealing these interactions). IV054 1. Basic Techniques I: Moments and Deviations 12/66 Remainder III. - COMBINATORIAL INEQUALITIES– I. Let us now present several combinatorial inequalities that are often used at the analysis of algorithms. n k = n n − k = n! k!(n − k)! ex = 1 + x + x2 2! + x3 3! + . . . e−x > 1 − x ln(1 + x) = x − x2 2 + x3 3 − x4 4 + . . . n k ≤ nk k! , n k ≤ ne k k , n k k ≤ n k IV054 1. Basic Techniques I: Moments and Deviations 13/66 For large n n k ∼ nk k! . IV054 1. Basic Techniques I: Moments and Deviations 14/66 Remainder IV. - COMBINATORIAL INEQUALITIES – II et ≥ 1 + t if t ∈ R. If n ≥ 1 and |t| ≤ n, then et 1 − t2 n ≤ 1 + t n n ≤ et . For all t, n ∈ R+ , it holds (1 + t n )n ≤ et ≤ (1 + t n )n+t/2 . nth Harmonic number Hn is defined as follows Hn = n i=1 1 i = ln n + θ(1). IV054 1. Basic Techniques I: Moments and Deviations 15/66 BASIC RESULT for OCCUPANCY PROBLEM Case: n = m Notation: Xj – the number of balls in the j th bin. E[Xi ] = 1 - this can be shown similarly as in case of the sailor problem. Notation: ξj (k) – the event that bin j has k or more balls in it. Analysis of ξ1(k) The probability that bin 1 receives exactly i balls is n i 1 n i 1 − 1 n n−i ≤ n i 1 n i ≤ ne i i 1 n i = e i i Therefore Pr[ξ1(k)] ≤ n i=k e i i ≤ e k k 1 + e k + e k 2 + . . . and for k = k∗ = e ln n ln ln n Pr [ξ1(k∗ )] ≤ e k∗ k∗ 1 1 − e k∗ ≤ n−2 . IV054 1. Basic Techniques I: Moments and Deviations 16/66 BASIC COROLLARIES Problem: What is the probability that at least one bin has at least k∗ balls in it? Solution: It holds Pr n i=1 ξi (k∗ ) ≤ n i=1 Pr[ξi (k∗ )] ≤ 1 n . Corollary With the probability at least 1 − 1 n no bin has more than k∗ = e ln n ln ln n balls in it. IV054 1. Basic Techniques I: Moments and Deviations 17/66 COMPLEXITY of SORTING It is well known that for sorting n elements: Worst case complexity is O(n lg n); Average case complexity is O(n lg n). Both bounds are with respect to the number of comparisons. Can we do better? In some reasonable sense? In some interesting cases? IV054 1. Basic Techniques I: Moments and Deviations 18/66 Reminder - Binomial distribution Let now values of a random variable Y be the number of successes of trials with success probability p in n trials. Then Pr(Y = k) = n k pk qn−k Such a probability distribution is called the binomial distribution and it holds EY = np VY = npq G(z) = (q + pz)n and also EY 2 = n(n − 1)p2 + np IV054 1. Basic Techniques I: Moments and Deviations 19/66 BUCKET SORT Bucket sort is a deterministic sorting algorithm that, under certain probabilistic assumptions on inputs, sorts numbers in the expected linear time. Suppose that we have a set of n = 2m integers that are to be sorted and they are chosen independently and uniformly at random from the interval [0, 2k ) for a k ≥ m. Using Bucket sort we can sort such numbers in the expected time O(n). IV054 1. Basic Techniques I: Moments and Deviations 20/66 BUCKET SORT ALGORITHM - STAGE 1 Stage 1. All to be sorted numbers will be put into n buckets in such a way that all numbers whose first m bits represent a number j will go to the j-th bucket. As a consequence, when j < l all elements in the j-th bucket comes before all elements in the l-bucket once all elements are sorted. If we assume that each element can be put in the appropriate bucket in constant time, the above stage requires O(n) time. Because of the assumption that the elements to be sorted are chosen uniformly, the number of elements that land uniformly in a bucket follows the binomial distribution B(n, 1 n ) introduced in Chapter 3. IV054 1. Basic Techniques I: Moments and Deviations 21/66 BUCKET SORT ALGORITHM - STAGE 2 Sort each bucket using a standard quadratic time algorithm and concatenate all sorted lists Analysis; If Xi is the number of elements in the ith bucket then they can be sorted in time cX2 i for some constant c. The expected time to do this sorting is therefore E n j=1 cX2 j = c n j=1 E[X2 j ] = cnE[X2 1 ] Since Xi is a binomial random variable B(n, 1 n ), see Chapter 3 we get E[X2 1 ] = n(n − 1) n2 + 1 = 2 − 1 n < 2 and therefore the expected time of the bucket sort is at most 2cn. IV054 1. Basic Techniques I: Moments and Deviations 22/66 BIRTHDAY PARADOX - BASICS Let us assume that the birthday of each person in a room is a random day chosen uniformly and independently from a 365-day year. In such case, for any given integer k > 0, the probability that in the room with 365 people there are at least k people having their birthdays in different days is: ¯p(n) = 1(1 − 1 365 )(1 − 2 365 )(1 − 3 365 ) . . . (1 − k − 1 365 ) = k−1 j=1 (1 − j 365 ) what equals to k! 365 k 365−k . Using the inequality 1 − j n ≈ e−j/n for j small comparing to n - we have for any integer n k−1 j=1 (1 − j n ) ≈ k−1 j=1 e−j/n = e− k−1 j=1 j n = e−k(k−1)/2n ≈ e−k2 /2n Hence the probability that some k people in the rom all have different birthdays from a set of n possible birthdays is 1 2 and it is approximately given by the equation k2 2n = ln 2 what gives, for the case n = 365, k = 22.49. IV054 1. Basic Techniques I: Moments and Deviations 23/66 k−1 j=1 (1 − j 365 ) = k−1 j=1 (365 − j) 365k−1 · (365 − k)! (365 − k)! = (365 − 1)! 365k−1(365 − k)! · (365) (365) = 365! 365k (365 − k)! · k! k! = k! 365 k 365−k . IV054 1. Basic Techniques I: Moments and Deviations 24/66 VARIATIONS on BIRTHDAY PARADOX A more detailed analysis of the basic equation shows that if we have 23 (30) [50] people in one room, then the probability that two of them have the same birthday is more than 50.7% (70.6%) (97%) -this is so called the Birthday Paradox. In the case we have 57 [100] people in the room the probability is 99% [99.99997%] IV054 1. Basic Techniques I: Moments and Deviations 25/66 More generally, if we have n objects and r people each choosing one object (and several of them can choose the same object), then if r ≈ 1.177 √ n (r ≈ √ 2λ), then probability that two people choose the same object is 50% (1 − eλ )%. IV054 1. Basic Techniques I: Moments and Deviations 26/66 Birthday paradox - graph - I. IV054 1. Basic Techniques I: Moments and Deviations 27/66 Birthday paradox - graph - II. IV054 1. Basic Techniques I: Moments and Deviations 28/66 ANOTHER VERSION of THE BIRTHDAY PARADOX Let us have n objects and two groups of r people. If r ≈ √ λn then the probability that someone from one group chooses the same object as someone from the other group is 1 − e−λ. IV054 1. Basic Techniques I: Moments and Deviations 29/66 STABLE MARRIAGE PROBLEM - INFORMAL FORMULATION Given is n men and n women and each of them has ranked all members of the opposite sex with a unique number between 1 and n in order to express of his/her preferences. Task: Marry all men and women together in such a way that there are no two (unsatisfied) people of the opposite sex who would both rather have each other than their current partners. If there is a no dissatisfied couple in a (group) marriage we consider the (group) marriage as stable. IV054 1. Basic Techniques I: Moments and Deviations 30/66 THE STABLE MARRIAGE PROBLEM Consider a society of n men A, B, C, . . . and n women a, b, c, . . . A marriage is 1-1 correspondence between men and women of that society. Assume that each person has a preference list of the members of the opposite sex, organised in a decreasing order of desirability. Example A : abcd a : ABCD B : bacd b : DCBA C : adcb c : ABCD D : dcab d : CDAB A marriage is said to be unstable if there exist two married couples X − x, Y − y such that X desires y more than x y desires X more than Y Such a pair (X, y) is called dissatisfied. The task is to find a stable marriage. (At least one always exist!) Example of an unstable marriage: A − a, B − b, C − c, D − d. IV054 1. Basic Techniques I: Moments and Deviations 31/66 COMMENTS The stable marriage problem, and its variations, form one of the most famous and important groups of algorithmic problems with a variety of interesting and important applications. A related book: Donald E. Knuth: Stable marriage and its relation to other combinatorial problems: an introduction to the mathematical analysis of algorithms, CRM Proceedings and Lecture Notes, Algorithms to deal with this type of problems are used, for example: To assign graduates of medical schools in North America (about 30 000) to hospitals; IV054 1. Basic Techniques I: Moments and Deviations 32/66 EXISTENCE and OPTIMALITY of SOLUTIONS NOTE 1: We will show later that a stable marriage always exists. NOTE 2: A stable marriage assignment does not need to be optimal for all. EXAMPLE: Let us have three men M1, M2 and M3 and three women W1, W2 and W3 with preferences: M1 : W2W1W3, M2 : W3W2W1, M3 : W1W3W2 W1 : M2M1M3, W2 : M3M2M1, W3 : M1M3M2 There are three stable solutions: All men get their first choice and all women their third one: M1W2, M2W3, M3W1 All get their second choice: M1W1, M2W2, M3W3 Women get their first choice and men the third one: M1W3, M2W1, M3W2 IV054 1. Basic Techniques I: Moments and Deviations 33/66 A naive, but not good enough, randomized algorithm 1 Start with some marriage of all. 2 until marriage is stable do randomly choose a dissatisfied pair, marry them and also their partners together Algorithm is not good because a loop can occur. IV054 1. Basic Techniques I: Moments and Deviations 34/66 EXAMPLE 1 Let us have the followig preferences: M1 : W3W2W4W1 M2 : W2W1W3W4 M3 : W2W4W1W3 M4 : W3W1W4W2 and W1 : M1M2M4M3 W2 : M3M1M4M2 W3 : M3M2M4M1 W4 : M2M1M3M4 Successful developments of marriages: M1W1 M2W2 M3W3 M4W4 − −unstable M1W2 M2W1 M3W3 M4W4 − −unstable M1W3 M2W1 M3W2 M4W4 − −unstable M1W4 M2W1 M3W2 M4W3 − −!stable! IV054 1. Basic Techniques I: Moments and Deviations 35/66 EXAMPLE 2 For choices: M1 : W2W1W3 M2 : arbitrary M3 : W1W2W3 and W1 : M1M3M2 W2 : M3M1M2 W3 : arbitrary we have the following cyclic development of marriages M1W1 M2W2 M3W3 M1W2 M2W1 M3W3 M1W3 M2W1 M3W2 M1W3 M2W2 M3W1 M1W1 M2W2 M3W3 IV054 1. Basic Techniques I: Moments and Deviations 36/66 EXAMPLE 3 For choices M1 : W1W2W3W4W5 M2 : W2W3W4W5W1 M3 : W3W4W5W1W2 M4 : W4W5W1W2W3 M5 : W5W1W2W3W4W5 and W1 : M2M3M4M5M1 W2 : M3M4M5M1M2 W3 : M4M5M1M2M3 W4 : M5M1M2M3M4 W5 : M1M2M3M4M5 we have exactly 5 stable marriages M1W1 M2W2 M3W3 M4W4 M5W5 M1W2 M2W3 M3W4 M4W5 M5W1 M1W3 M2W4 M3W5 M4W1 M5W2 M1W4 M2W5 M4W1 M4W2 M5W3 M1W5 M2W1 M4W2 M5W3 M5W4 In the x-th of the above marriages each man is married with his x-th choice. IV054 1. Basic Techniques I: Moments and Deviations 37/66 PROPOSAL ALGORITHMS The naive approach – to start with an arbitrary marriage and to try to stabilize it by pairing up dissatisfied couples – does not always work. MEN PROPOSAL ALGORITHM -“man proposes, woman disposes” Assume that all men are numbered somehow. At any step of the algorithm (due to Gale-Shapley), there will be a partial marriage, and the lowest-number unmarried man M proposes ”marriage” to the most desirable women W on his list who has not rejected him yet. The woman W then decides whether to accept his proposal or to reject it. The women W accepts the proposal if she is not yet married or she likes M more than her current partner. The algorithm repeats the process and terminates after every person has been married. It is a linear time algorithm, concerning the worst case complexity. It is easy to see that the process terminates and resulting marriage is stable. IV054 1. Basic Techniques I: Moments and Deviations 38/66 CORRECTNESS of the ALGORITHM Everyone gets married Observe that once a women gets married she will stay married (though she can change her partners - even several times). It cannot be the case that at the end there is a man and a woman who are not married. Indeed, the men would have proposed her marriage at some point and being unmarried she could not refused him. Final marriage is stable Indeed, let at the end M be a men and W a women who are married, but not to each other and they are dissatisfied. If M prefers W over his current partner, he must have proposed marriage to W before he did that to his current partner. If W accepted his proposal yet is not married with him at the end, she must have changed him for someone she likes more and therefore she cannot like M more than her current partner. If W rejected his proposal, she was already married with someone she liked more than M. IV054 1. Basic Techniques I: Moments and Deviations 39/66 COMPLEXITY of the PROPOSAL ALGORITHM At each proposal step one women is eliminated from a man list. Total number of proposals is therefore at most n2 . The result of the men-proposal algorithm does not depend on the order men are chosen to make their proposals. IV054 1. Basic Techniques I: Moments and Deviations 40/66 PROPERTIES of the GALE-SHAPLEY-ALGORITHM Gale-Shapley marriage is men-optimal and women-pessimal. To see that consider the following definition of a feasible marriage. A marriage between a man A and a woman B is called feasible if there exists a stable pairing (marriage) in which A and B are married. It is said that a marriage is men-optimal if every man is married with his highest ranked feasible partner. It is said that a marriage is women-pessimal if each woman is married with her lowest ranked feasible partner. IV054 1. Basic Techniques I: Moments and Deviations 41/66 SOME APPLICATIONS - I. National residency matching program. This program places applicants for postgraduate medical training positions into residency programs at teaching hospitals throughout US. IV054 1. Basic Techniques I: Moments and Deviations 42/66 SOME APPLICATIONS - II. Dental residencies and medical specialities in the USA, Canada and parts of UK National university entrance exam in Iran Placement of Canadian lawyers in Ontario and Alberta Matching of new reform rabbis to their first congregation Assignment of students to high-schools in NYC IV054 1. Basic Techniques I: Moments and Deviations 43/66 STABLE HUSBANDS A stable husband of a woman, with respect to a given rankings, is a man she can be married with in a stable marriage. D. E. Knuth and et al. showed that In case of n men and n women, any woman has at least (1 2 − ) ln n and at most (1 + ) ln n different stable husbands in the set of all Gale-Shapley stable matchings defined by these rankings, with probability approaching 1 as n → ∞ , if is any positive constant. There is an algorithm that outputs all stable husbands of a given women. IV054 1. Basic Techniques I: Moments and Deviations 44/66 RANDOMIZED VERSIONS of the PROPOSAL ALGORITHM Next goal: The average-case analysis of the proposal algorithm under the assumptions: men’s lists are chosen independently and randomly, women’s lists can be arbitrary, but are fixed in advance. Let Tp be the random variable that denote the number of proposals made during the execution of the Proposal algorithm – what is proportional to the overall time of algorithm. Distribution of Tp seems to be very difficult to determine or even to analyse. Our goal is to show that the expected value of the number of proposals is about O(n lg n). IV054 1. Basic Techniques I: Moments and Deviations 45/66 Game “CLOCK SOLITAIRE” We illustrate first, on a simple card game, a simple technique that allows to analyse randomized algorithms with seemingly complex behaviour. 1. Game ”Clock Solitaire” A standard deck of 52 cards is randomly shuffled and then divided into 13 piles (columns) of 4 cards each. Each pile is arbitrarily labeled with a distinct symbol from {A, 2, . . . , 10, J, Q, K} A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K 4 3 4 5 6 7 8 9 10 J Q K A 2 4 5 6 7 8 9 10 J Q K A 3 3 5 6 7 8 9 10 J Q K A 2 2 Playing the game: On the first move a card is drawn from the pile labeled K. At each subsequent move, a card is drawn, by the player, from the pile whose label is the face value of the card at the previous move. The game ends, if the player makes an attempt to draw a card from an empty pile. Player wins the game if, on termination, all 52 cards have been drawn. In all other cases the player looses the game. What is the probability to win the game? IV054 1. Basic Techniques I: Moments and Deviations 46/66 A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K 2 3 4 5 6 7 8 9 10 J Q K A 3 4 5 6 7 8 9 10 J Q K A 2 4 5 6 7 8 9 10 J Q K A 2 3 A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K 2 3 4 5 6 7 8 9 10 J Q K A 3 4 5 6 7 8 9 10 J Q K A . 4 5 6 7 8 9 10 J Q K A 2 3 A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K 2 . 4 5 6 7 8 9 10 J Q K A 3 4 5 6 7 8 9 10 J Q K A . 4 5 6 7 8 9 10 J Q K A 2 3 IV054 1. Basic Techniques I: Moments and Deviations 47/66 PRINCIPLE od ”DEFERRED RANDOM DECISIONS” 1. Game ”Clock Solitaire” – repetition A standard deck of 52 cards is randomly shuffled and then divided into 13 piles of 4 cards each. Each pile is arbitrarily labeled with a distinct symbol from {A, 2, . . . , 10, J, Q, K} At each subsequent move, a card is drawn from the pile whose label is the face value of the card at the previous move. The game ends, if an attempt is made to draw a card from an empty pile. Observe that our game always terminates in an attempt to draw a card from the K-pile. (Why?) ANALYSIS of ALGORITHM How to choose the probability space? Let the random choices unfold with progress of the game: that is at any step each of the yet unseen cards is likely to appear. Thus, the process of playing this game is equivalent to the process of repeatedly drawing cards uniformly and randomly from the deck of 52 cards. A winning game corresponds to the situation where the first 51 cards drawn in this fashion contain exactly 3 kings! Probability of winning our game is therefore, clearly, 1/13. IV054 1. Basic Techniques I: Moments and Deviations 48/66 Principle of Deferred Decisions At solving problems, it is not necessary that the entire set of random choices has to be made in advance, rather, it is sufficient that at each step of the algorithm we fix only that random choice that needs to be revealed at that step. IV054 1. Basic Techniques I: Moments and Deviations 49/66 ANALYSIS of RANDOMIZED VERSION of PROPOSAL ALGORITHM 1/2 Principle of deferred decision: Do not assume that entire set of random choices is made in advance. Rather, at each step of the process fix only that random choices that must be revealed at that step to the algorithm. An application to the Proposal Algorithm: We will remove dependencies by do not assuming that men have chosen their preference lists in advance. We will assume that each time a man has to make a proposal he picks a random woman from the list od women not already proposed by him, and proceeds to propose her. (Clearly this is equivalent to choosing a random preference list prior the execution of the algorithm.) The only dependency that remains is that the random choice of a women at any step depends on the proposals made so far by the current proposer. To eliminate the above dependency let us change the algorithm. Each time a man makes proposal he chooses randomly a woman from the set of all women. Call this new algorithm Amnesiac Algorithm. IV054 1. Basic Techniques I: Moments and Deviations 50/66 ANALYSIS of RANDOMIZED VERSION of PROPOSAL ALGORITHM 2/2 Let TA(TP) be the number of proposal made by the Amnesiac (Proposal) algorithm. It is obvious that for all m Pr[ TA > m] ≥ Pr[ TP ≥ m] and therefore an upper bound on TA is an upper bound on TP . The advantage of analyzing TA is that we need only to count the total number of proposals made – without regard to the name of the proposer at each stage. New algorithm terminates with a stable marriage once each woman has received at least one proposal (for a “marriage”). The task to determine a good upper bound of TA is a special case of the task to determine such a bound for so-called Coupon Selection Problem discussed next. At the end we will get: Theorem For any constant c ∈ and m = n ln n + cn lim n→∞ Pr[ TA > m ] = 1 − e−e−c IV054 1. Basic Techniques I: Moments and Deviations 51/66 COUPON SELECTION PROBLEM There are n types of coupons and at each time a coupon is chosen at random. The task is to determine for each m ≥ n the probability of having collected at least one of each of the n types of coupons in m trials. Elementary analysis Let X be a random variable the value of which is the number of trials required to collect at least one of each type of coupons. Let C1, . . . , CX denote a sequence of trials. The i-th trial Ci will be called success if the coupon selected in the trial Ci was not drawn in any of the first i − 1 selections. Sequence C1, . . . , CX will be divided into epochs. i-th epoch begins with the trial following the i-th success and ends with the trial which is (i + 1)-th success. Let Xi , 0 ≤ i < n, be the number of trials in the i-th epoch. Then X = n−1 i=0 Xi IV054 1. Basic Techniques I: Moments and Deviations 52/66 If pi is the probability of the success on any trial of the i-th epoch then pi = n − i n . Random variable Xi is geometrically distributed, with the parameter pi , and therefore its average value is E[Xi ] = 1 pi = n n−i and its variance V [Xi ] = σ2 Xi = 1−pi p2 i = ni (n−i)2 . By the linearity of expectations we have: E[X] = E n−1 i=0 Xi = n−1 i=0 E[Xi ] = n−1 i=0 n n − i = n n i=1 1 i = nHn. Since Xi are independent σ2 X = n−1 i=0 σ2 Xi = n−1 i=0 ni (n − i)2 = n−1 i=0 n(n − i) i2 = n2 n i=1 1 i2 − nHn Since limn→∞ n i=1 1 i2 = π2 6 we have limn→∞ σ2 X n2 = π2 6 . IV054 1. Basic Techniques I: Moments and Deviations 53/66 We show now that X unlikely deviates much from expectation Let εr i denote the event that a coupon of type i is not collected in the first r trials. Pr[εr i ] = (1 − 1 n )r ≤ e− r n = n−β for r = βn ln n Therefore, for r = βn ln n, we get Pr[X > r] = Pr [∪n i=1εr i ] ≤ n i=1 Pr[εr i ] ≤ n i=1 n−β = n−(β−1) Next aim: To study the probability that X deviates from its expectation nHn by the amount cn for any real c. IV054 1. Basic Techniques I: Moments and Deviations 54/66 A TECHNICAL LEMMA and MAIN THEOREM Lemma Let c be a real number and m = n ln n + cn for a positive integer n. Then, for any fixed k it holds lim n→∞ n k (1 − k n )m = e−ck k! . IV054 1. Basic Techniques I: Moments and Deviations 55/66 MAIN THEOREM 1/4 Theorem Let the random variable X denote the number of trials for collecting each of the n types of coupons. Then for any c ∈ R and m = n ln n + cn lim n→∞ Pr[X > m] = 1 − e−e−c Proof Consider the event {X > m} = n i=1 εm i . By the principle of the Inclusion-Exclusion Pr n i=1 εm i = n k=1 (−1)k+1 Pn k (∗) where Pn k = 1≤i1<··· 0 there exists k∗ > 0 such that for any k > k∗ |Sk − f (c)| < ε. Since limn→∞ Pn k = Pk , we have limn→∞ Sn k = Sk . Equivalently, for all ε > 0 and all k, for all sufficiently large. n |Sn k − Sk | < ε Thus, for all ε > 0 any fixed k > k∗ , and n sufficiently large |Sn k − Sk | < ε, |Sk − f (c)| < ε =⇒ |Sn k − f (c)| = |Sn k − Sk | + |Sk − f (c)| < 2ε and |Sn 2k − Sn 2k+1| < 4ε. As a consequence Pr n i=1 εm i − f (c) < 4ε and therefore lim n→∞ Pr n i=1 εm i = f (c) = 1 − e−e−c IV054 1. Basic Techniques I: Moments and Deviations 61/66 MAIN THEOREM 4/4 what implies lim n→∞ Pr[X > n(ln n + c)] = 1 − e−e−c Implications With extremely high probability, the number of trials, for collecting all n coupon types, lies in a small interval centered about its expected value. IV054 1. Basic Techniques I: Moments and Deviations 62/66 A SUMMARY of the ANALYSIS of STABLE MARRIAGE PROBLEM In case of the stable marriage problem of n men and women we have The worst case complexity (of the number of proposals) in n2 , The average case complexity is O(n lg n). Deviation is small from the expected case. IV054 1. Basic Techniques I: Moments and Deviations 63/66 APPENDIX IV054 1. Basic Techniques I: Moments and Deviations 64/66 SIMILAR PROBLEMS Generalised stable marriage problem A man (woman) may not be willing to marry some partners from the opposite sex and may prefer to stay single. Stable roommate problem is similar to the stable marriage problem, but all participants belong to a single pool (Group). Hospitals-students(medical) problem This differs from the stable marriage problem that a women [hospital] can accept ”proposals” from more than one man [student]. Hospital-students problems with couples Similar problem as the above one, but among students can be couples that have to be assigned either to the same hospital or to a specific pair of hospitals chosen by couples. IV054 1. Basic Techniques I: Moments and Deviations 65/66 EXERCISES 1 Which of the numbers eπ and πe , is larger, for the case that e is the basis of natural logarithms 2 Hint 1: There exists one-line proof of the correct relation. IV054 1. Basic Techniques I: Moments and Deviations 66/66