IA168 Algorithmic Game Theory Tomáš Brázdil 1 Organization of This Course Sources: Lectures (slides, notes) based on several sources slides are prepared for lectures, some stuff on greenboard (⇒ attend the lectures) Books: Nisan/Roughgarden/Tardos/Vazirani, Algorithmic Game Theory, Cambridge University, 2007. Available online for free: http://www.cambridge.org/journals/nisan/downloads/Nisan_Non-printable.pdf Tadelis, Game Theory: An Introduction, Princeton University Press, 2013 (I use various resources, so please, attend the lectures) 2 Evaluation Oral exam Homework 3 What is Algorithmic Game Theory? First, what is the game theory? According to the Oxford dictionary it is "the branch of mathematics concerned with the analysis of strategies for dealing with competitive situations where the outcome of a participant’s choice of action depends critically on the actions of other participants" According to Myerson it is "the study of mathematical models of conflict and cooperation between intelligent rational decision-makers" What does the "algorithmic" mean? It means that we are "concerned with the computational questions that arise in game theory, and that enlighten game theory. In particular, questions about finding efficient algorithms to ‘solve’ games.” Let’s have a look at some examples .... 4 Prisoner’s Dilemma Two suspects of a serious crime are arrested and imprisoned. Police has enough evidence of only petty theft, and to nail the suspects for the serious crime they need testimony from at least one of them. The suspects are interrogated separately without any possibility of communication. Each of the suspects is offered a deal: If he confesses (C) to the crime, he is free to go. The alternative is not to confess, that is remain silent (S). Sentence depends on the behavior of both suspects. The problem: What would the suspects do? 5 Prisoner’s Dilemma – Solution(?) C S C −5, −5 0, −20 S −20, 0 −1, −1 Rational "row" suspect (or his adviser) may reason as follows: If my colleague chooses C, then playing C gives me −5 and playing S gives −20. If my colleague chooses S, then playing C gives me 0 and playing S gives −1. In both cases C is clearly better (it strictly dominates the other strategy). If the other suspect’s reasoning is the same, both choose C and get 5 years sentence. Where is the dilemma? There is a solution (S, S) which is better for both players but needs some “central” authority to control the players. Are there always “dominant” strategies? 6 Nash equilibria – Battle of Sexes A couple agreed to meet this evening, but cannot recall if they will be attending the opera or a football match. The husband would like to go to the football game. The wife would like to go to the opera. Both would prefer to go to the same place rather than different ones. If they cannot communicate, where should they go? 7 Nash equilibria – Battle of Sexes Battle of Sexes can be modeled as a game of two players (Wife, Husband) with the following payoffs: O F O 2, 1 0, 0 F 0, 0 1, 2 Apparently, no strategy of any player is dominant. A “solution”? Note that whenever both players play O, then neither of them wants to unilaterally deviate from his strategy! (O, O) is an example of a Nash equilibrium (as is (F, F)) 8 Mixed Equilibria – Rock-Paper-Scissors R P S R 0, 0 −1, 1 1, −1 P 1, −1 0, 0 −1, 1 S −1, 1 1, −1 0, 0 This is an example of zero-sum games: whatever one of the players wins, the other one looses. What is an optimal behavior here? Is there a Nash equilibrium? Use mixed strategies: Each player plays each pure strategy with probability 1/3. The expected payoff of each player is 0 (even if one of the players changes his strategy, he still gets 0!). How to algorithmically solve games in mixed strategies? (we shall use probability theory and linear programming) 9 Philosophical Issues in Games 10 Dynamic Games So far we have seen games in strategic form that are unable to capture games that unfold over time (such as chess). For such purpose we need to use extensive form games: P1 P2 (1, 2) C (1, −1) D (0, 2) E A P2 (2, 2) F (1, 3) G B How to "solve" such games? What is their relationship to the strategic form games? 11 Chance and Imperfect Information Some decisions in the game tree may be by chance and controlled by neither player (e.g. Poker, Backgammon, etc.) Sometimes a player may not be able to distinguish between several “positions” because he does not know all the information in them (Think a card game with opponent’s cards hidden). F G D 1 2 F G E1 2 A H I J B P1 P1 Nature P2 (a, b) (c, d) (e, f) (g, h) (i, j) (k, ) (m, n) Again, how to solve such games? 12 Games of Incomplete Information In all previous games the players knew all details of the game they played, and this fact was a “common knowledge”. This is not always the case. Example: Sealed Bid Auction Two bidders are trying to purchase the same item. The bidders simultaneously submit bids b1 and b2 and the item is sold to the highest bidder at his bid price (first price auction) The payoff of the player 1 (and similarly for player 2) is calculated by u1(b1, b2) =    v1 − b1 b1 > b2 1 2 (v1 − b1) b1 = b2 0 b1 < b2 Here v1 is the private value that player 1 assigns to the item and so the player 2 does not know u1. How to deal with such a game? Assume the “worst” private value? What if we have a partial knowledge about the private values? 13 Inefficiency of Equilibria In Prisoner’s Dilemma, the selfish behavior of suspects (the Nash equilibrium) results in somewhat worse than ideal situation. C S C −5, −5 0, −20 S −20, 0 −1, −1 Defining a welfare function W which to every pair of strategies assigns the sum of payoffs, we get W(C, C) = −10 but W(S, S) = −2. The ratio W(C,C) W(S,S) = 5 measures the inefficiency of "selfish-behavior" (C, C) w.r.t. the optimal “centralized” solution. Price of Anarchy is the maximum ratio between values of equilibria and the value of an optimal solution. 14 Inefficiency of Equilibria – Selfish Routing Consider a transportation system where many agents are trying to get from some initial location to a destination. Consider the welfare to be the average time for an agent to reach the destination. There are two versions: “Centralized”: A central authority tells each agent where to go. “Decentralized”: Each agent selfishly minimizes his travel time. Price of Anarchy measure the ratio between average travel time in these two cases. Problem: Bound the price of anarchy over all routing games? 15 Games in Computer Science Game theory is a core foundation of mathematical economics. But what does it have to do with CS? Games in AI: modeling of “rational” agents and their interactions. Games in Algorithms: several game theoretic problems have a very interesting algorithmic status and are solved by interesting algorithms Games in modeling and analysis of reactive systems: program inputs viewed “adversarially”, bisimulation games, etc. Games in computational complexity: Many complexity classes are definable in terms of games: PSPACE, polynomial hierarchy, etc. Games in Logic: modal and temporal logics, Ehrenfeucht-Fraisse games, etc. 16 Games in Computer Science Games, the Internet and E-commerce: An extremely active research area at the intersection of CS and Economics Basic idea: “The internet is a HUGE experiment in interaction between agents (both human and automated)” How do we set up the rules of this game to harness “socially optimal” results? 17 Summary and Brief Overview This is a theoretical course aimed at some fundamental results of game theory, often related to computer science We start with strategic form games (such as the Prisoner’s dilemma), investigate several solution concepts (dominance, equilibria) and related algorithms (in particular, Lemke-Howson algorithm for computing Nash Eq.) Then we consider repeated games which allow players to learn from history and/or to react to deviations of the other players. Subsequently, we move on to incomplete information games and auctions Finally, we consider (in)efficiency of equilibria (such as the Price of Anarchy) and its properties on important classes of routing and network formation games. Remaining time will be devoted to selected topics from extensive form games, games on graphs etc. 18 Static Games of Complete Information Strategic-Form Games Solution concepts 19 Static Games of Complete Information – Intuition Proceed in two steps: 1. Each player simultaneously and independently chooses a strategy. This means that players play without observing strategies chosen by other players. 2. Conditional on the players’ strategies, payoffs are distributed to all players. Complete information means that the following is common knowledge among players: all possible strategies of all players, what payoff is assigned to each combination of strategies. Definition 1 A fact E is a common knowledge among players {1, . . . , n} if for every sequence i1, . . . , ik ∈ {1, . . . , n} we have that i1 knows that i2 knows that ... ik−1 knows that ik knows E. The goal of each player is to maximize his payoff (and this fact is common knowledge). 20 Strategic-Form Games To formally represent static games of complete information we define strategic-form games. Definition 2 A game in strategic-form (or normal-form) is an ordered triple G = (N, (Si)i∈N , (ui)i∈N), in which: N = {1, 2, . . . , n} is a finite set of players. Si is a set of (pure) strategies of player i, for every i ∈ N. A strategy profile is a vector of strategies of all players (s1, . . . , sn) ∈ S1 × · · · × Sn. We denote the set of all strategy profiles by S = S1 × · · · × Sn. ui : S → R is a function associating each strategy profile s = (s1, . . . , sn) ∈ S with the payoff ui(s) to player i, for every player i ∈ N. Definition 3 A zero-sum game G is one in which for all s = (s1, . . . , sn) ∈ S we have u1(s) + u2(s) + · · · + un(s) = 0. 21 Example: Prisoner’s Dilemma N = {1, 2} S1 = S2 = {S, C} u1, u2 are defined as follows: u1(C, C) = −5, u1(C, S) = 0, u1(S, C) = −20, u1(S, S) = −1 u2(C, C) = −5, u2(C, S) = −20, u2(S, C) = 0, u2(S, S) = −1 (Is it zero sum?) We usually write payoffs in the following form: C S C −5, −5 0, −20 S −20, 0 −1, −1 or as two matrices: C S C −5 0 S −20 −1 C S C −5 −20 S 0 −1 22 Example: Cournot Duopoly Two identical firms, players 1 and 2, produce some good. Denote by q1 and q2 quantities produced by firms 1 and 2, resp. The total quantity of products in the market is q1 + q2. The price of each item is κ − q1 − q2 (here κ is a positive constant) Firms 1 and 2 have per item production costs c1 and c2, resp. Question: How these firms are going to behave? We may model the situation using a strategic-form game. Strategic-form game model (N, (Si)i∈N , (ui)i∈N) N = {1, 2} Si = [0, ∞) u1(q1, q2) = q1(κ − q1 − q2) − q1c1 u2(q1, q2) = q2(κ − q1 − q2) − q2c2 23 Solution Concepts A solution concept is a method of analyzing games with the objective of restricting the set of all possible outcomes to those that are more reasonable than others. We will use term equilibrium for any one of the strategy profiles that emerges as one of the solution concepts’ predictions. (I follow the approach of Steven Tadelis here, it is not completely standard) Example 4 Nash equilibrium is a solution concept. That is, we “solve” games by finding Nash equilibria and declare them to be reasonable outcomes. 24 Assumptions Throughout the lecture we assume that: 1. Players are rational: a rational player is one who chooses his strategy to maximize his payoff. 2. Players are intelligent: An intelligent player knows everything about the game (actions and payoffs) and can make any inferences about the situation that we can make 3. Common knowledge: The fact that players are rational and intelligent is a common knowledge among them. 4. Self-enforcement: Any prediction (or equilibrium) of a solution concept must be self-enforcing. Here 4. implies non-cooperative game theory: Each player is in control of his actions, and he will stick to an action only if he finds it to be in his best interest. 25 Evaluating Solution Concepts In order to evaluate our theory as a methodological tool we use the following criteria: 1. Existence (i.e. How often does it apply?): Solution concept should apply to a wide variety of games. E.g. We prove that mixed Nash equilibria exist in all two player finite strategic-form games. 2. Uniqueness (How much does it restrict behavior?): We demand our solution concept to restrict the behavior as much as possible. E.g. So called strictly dominant strategy equilibria are always unique as opposed to Nash eq. The basic notion for evaluating "social outcome" is the following Definition 5 A strategy profile s ∈ S Pareto dominates a strategy profile s ∈ S if ui(s) ≥ ui(s ) for all i ∈ N, and ui(s) > ui(s ) for at least one i ∈ N. A strategy profile s ∈ S is Pareto optimal if it is not Pareto dominated by any other strategy profile. We will see more measures of social outcome later. 26 Solution Concepts – Pure Strategies We will consider the following solution concepts: strict dominant strategy equilibrium iterated elimination of strictly dominated strategies (IESDS) rationalizability Nash equilibria For now, let us concentrate on pure strategies only! I.e., no mixed strategies are allowed. We will generalize to mixed setting later. 27 Notation Let N = {1, . . . , n} be a finite set and for each i ∈ N let Xi be a set. Let X := i∈N Xi = {(x1, . . . , xn) | xj ∈ Xj, j ∈ N}. For i ∈ N we define X−i := j i Xj, i.e., X−i = {(x1, . . . , xi−1, xi+1, . . . , xn) | xj ∈ Xj, ∀j i} An element of X−i will be denoted by x−i = (x1, . . . , xi−1, xi+1, . . . , xn) We slightly abuse notation and write (xi, x−i) to denote (x1, . . . , xi, . . . , xn) ∈ X. 28 Strict Dominance in Pure Strategies Definition 6 Let si, si ∈ Si be strategies of player i. Then si is strictly dominated by si (write si si ) if for any possible combination of the other players’ strategies, s−i ∈ S−i, we have ui(si, s−i) > ui(si , s−i) for all s−i ∈ S−i Claim 1 An intelligent and rational player will never play a strictly dominated strategy. Clearly, intelligence implies that the player should recognize dominated strategies, rationality implies that the player will avoid playing them. 29 Strictly Dominant Strategy Equilibrium in Pure Str. Definition 7 si ∈ Si is strictly dominant if every other pure strategy of player i is strictly dominated by si. Observe that every player has at most one strictly dominant strategy, and that strictly dominant strategies do not have to exist. Claim 2 Any rational player will play the strictly dominant strategy (if it exists). Definition 8 A strategy profile s ∈ S is a strictly dominant strategy equilibrium if si ∈ Si is strictly dominant for all i ∈ N. Corollary 9 If the strictly dominant strategy equilibrium exists, it is unique and rational players will play it. Is the strictly dominant strategy equilibrium always Pareto optimal? 30 Examples In the Prisoner’s dilemma: C S C −5, −5 0, −20 S −20, 0 −1, −1 (C, C) is the strictly dominant strategy equilibrium (the only profile that is not Pareto optimal!). In the Battle of Sexes: O F O 2, 1 0, 0 F 0, 0 1, 2 no strictly dominant strategies exist. 31 Indiana Jones and the Last Crusade (Taken from Dixit & Nalebuff’s "The Art of Strategy" and a lecture of Robert Marks) Indiana Jones, his father, and the Nazis have all converged at the site of the Holy Grail. The two Joneses refuse to help the Nazis reach the last step. So the Nazis shoot Indiana’s dad. Only the healing power of the Holy Grail can save the senior Dr. Jones from his mortal wound. Suitably motivated, Indiana leads the way to the Holy Grail. But there is one final challenge. He must choose between literally scores of chalices, only one of which is the cup of Christ. While the right cup brings eternal life, the wrong choice is fatal. The Nazi leader impatiently chooses a beautiful gold chalice, drinks the holy water, and dies from the sudden death that follows from the wrong choice. Indiana picks a wooden chalice, the cup of a carpenter. Exclaiming "There’s only one way to find out" he dips the chalice into the font and drinks what he hopes is the cup of life. Upon discovering that he has chosen wisely, Indiana brings the cup to his father and the water heals the mortal wound. 32 Indiana Jones and the Last Crusade (cont.) Indy Goofed Although this scene adds excitement, it is somewhat embarrassing that such a distinguished professor as Dr. Indiana Jones would overlook his dominant strategy. He should have given the water to his father without testing it first. If Indiana has chosen the right cup, his father is still saved. If Indiana has chosen the wrong cup, then his father dies but Indiana is spared. Testing the cup before giving it to his father doesn’t help, since if Indiana has made the wrong choice, there is no second chance – Indiana dies from the water and his father dies from the wound. 33 Iterated Strict Dominance in Pure Strategies We know that no rational player ever plays strictly dominated strategies. As each player knows that each player is rational, each player knows that his opponents will not play strictly dominated strategies and thus all opponents know that effectively they are facing a "smaller" game. As rationality is a common knowledge, everyone knows that everyone knows that the game is effectively smaller. Thus everyone knows, that nobody will play strictly dominated strategies in the smaller game (and such strategies may indeed exist). Because it is a common knowledge that all players will perform this kind of reasoning again, the process can continue until no more strictly dominated strategies can be eliminated. 34 IESDS The previous reasoning yields the Iterated Elimination of Strictly Dominated Strategies (IESDS): Define a sequence D0 i , D1 i , D2 i , . . . of strategy sets of player i. (Denote by Gk DS the game obtained from G by restricting to Dk i , i ∈ N.) 1. Initialize k = 0 and D0 i = Si for each i ∈ N. 2. For all players i ∈ N: Let Dk+1 i be the set of all pure strategies of Dk i that are not strictly dominated in Gk DS . 3. Let k := k + 1 and go to 2. We say that si ∈ Si survives IESDS if si ∈ Dk i for all k = 0, 1, 2, . . . Definition 10 A strategy profile s = (s1, . . . , sn) ∈ S is an IESDS equilibrium if each si survives IESDS. A game is IESDS solvable if it has a unique IESDS equilibrium. Remark: If all Si are finite, then in 2. we may remove only some of the strictly dominated strategies (not necessarily all). The result is not affected by the order of elimination since strictly dominated strategies remain strictly dominated even after removing some other strictly dominated strategies. 35 IESDS Examples In the Prisoner’s dilemma: C S C −5, −5 0, −20 S −20, 0 −1, −1 (C, C) is the only one surviving the first round of IESDS. In the Battle of Sexes: O F O 2, 1 0, 0 F 0, 0 1, 2 all strategies survive all rounds (i.e. IESDS ≡ anything may happen, sorry) 36 A Bit More Interesting Example L C R L 4, 3 5, 1 6, 2 C 2, 1 8, 4 3, 6 R 3, 0 9, 6 2, 8 IESDS on greenboard! 37 Political Science Example: Median Voter Theorem Hotelling (1929) and Downs (1957) N = {1, 2} Si = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} (political and ideological spectrum) 10 voters belong to each position (Here 10 means ten percent in the real-world) Voters vote for the closest candidate. If there is a tie, then 1 2 got to each candidate Payoff: The number of voters for the candidate, each candidate (selfishly) strives to maximize this number 38 Political Science Example: Median Voter Theorem 1 and 10 are the (only) strictly dominated strategies ⇒ D1 1 = D1 2 = {2, . . . , 9} in G1 DS , 2 and 9 are the (only) strictly dominated strategies ⇒ D2 1 = D2 2 = {3, . . . , 8} . . . only 5, 6 survive IESDS 39 Belief & Best Response IESDS eliminated apparently unreasonable behavior (leaving "reasonable" behavior implicitly untouched). What if we rather want to actively preserve reasonable behavior? What is reasonable? .... what we believe is reasonable :-). Intuition: Imagine that your colleague did something stupid What would you ask him? Usually something like "What were you thinking?" The colleague may respond with a reasonable description of his belief in which his action was (one of) the best he could do (You may of course question reasonableness of the belief) Let us formalize this type of reasoning .... 40 Belief & Best Response Definition 11 A belief of player i is a pure strategy profile s−i ∈ S−i of his opponents. Definition 12 A strategy si ∈ Si of player i is a best response to a belief s−i ∈ S−i if ui(si, s−i) ≥ ui(si , s−i) for all si ∈ Si Claim 3 A rational player who believes that his opponents will play s−i ∈ S−i always chooses a best response to s−i ∈ S−i. Definition 13 A strategy si ∈ Si is never best response if it is not a best response to any belief s−i ∈ S−i. A rational player never plays any strategy that is never best response. 41 Best Response vs Strict Dominance Proposition 1 If si is strictly dominated for player i, then it is never best response. The opposite does not have to be true in pure strategies: X Y A 1, 1 1, 1 B 2, 1 0, 1 C 0, 1 2, 1 Here A is never best response but is strictly dominated neither by B, nor by C. 42 Elimination of Stupid Strategies = Rationalizability Using similar iterated reasoning as for IESDS, strategies that are never best response can be iteratively eliminated. Define a sequence R0 i , R1 i , R2 i , . . . of strategy sets of player i. (Denote by Gk Rat the game obtained from G by restricting to Rk i , i ∈ N.) 1. Initialize k = 0 and R0 i = Si for each i ∈ N. 2. For all players i ∈ N: Let Rk+1 i be the set of all strategies of Rk i that are best responses to some beliefs in Gk Rat . 3. Let k := k + 1 and go to 2. We say that si ∈ Si is rationalizable if si ∈ Rk i for all k = 0, 1, 2, . . . Definition 14 A strategy profile s = (s1, . . . , sn) ∈ S is a rationalizable equilibrium if each si is rationalizable. We say that a game is solvable by rationalizability if it has a unique rationalizable equilibrium. (Warning: For some reasons, rationalizable strategies are almost always defined using mixed strategies!) 43 Rationalizability Examples In the Prisoner’s dilemma: C S C −5, −5 0, −20 S −20, 0 −1, −1 (C, C) is the only rationalizable equilibrium. In the Battle of Sexes: O F O 2, 1 0, 0 F 0, 0 1, 2 all strategies are rationalizable. 44 Cournot Duopoly G = (N, (Si)i∈N , (ui)i∈N) N = {1, 2} Si = [0, ∞) u1(q1, q2) = q1(κ − q1 − q2) − q1c1 = (κ − c1)q1 − q2 1 − q1q2 u2(q1, q2) = q2(κ − q2 − q1) − q2c2 = (κ − c2)q2 − q2 2 − q2q1 Assume for simplicity that c1 = c2 = c and denote θ = κ − c. What is a best response of player 1 to a given q2 ? Solve δu1 δq1 = θ − 2q1 − q2 = 0, which gives that q1 = (θ − q2)/2 is the only best response of player 1 to q2. Similarly, q2 = (θ − q1)/2 is the only best response of player 2 to q1. Since q2 ≥ 0, we obtain that q1 is never best response iff q1 > θ/2. Similarly q2 is never best response iff q2 > θ/2. Thus R1 1 = R1 2 = [0, θ/2]. 45 Cournot Duopoly G = (N, (Si)i∈N , (ui)i∈N) N = {1, 2} Si = [0, ∞) u1(q1, q2) = q1(κ − q1 − q2) − q1c1 = (κ − c1)q1 − q2 1 − q1q2 u2(q1, q2) = q2(κ − q2 − q1) − q2c2 = (κ − c2)q2 − q2 2 − q2q1 Assume for simplicity that c1 = c2 = c and denote θ = κ − c. Now, in G1 Rat , we still have that q1 = (θ − q2)/2 is the best response to q2, and q2 = (θ − q1)/2 the best resp. to q1 Since q2 ∈ R1 2 = [0, θ/2], we obtain that q1 is never best response iff q1 ∈ [0, θ/4) Similarly q2 is never best response iff q2 ∈ [0, θ/4) Thus R2 1 = R2 2 = [θ/4, θ/2]. .... 46 Cournot Duopoly (cont.) G = (N, (Si)i∈N , (ui)i∈N) N = {1, 2} Si = [0, ∞) u1(q1, q2) = q1(κ − q1 − q2) − q1c1 = (κ − c1)q1 − q2 1 − q1q2 u2(q1, q2) = q2(κ − q2 − q1) − q2c2 = (κ − c2)q2 − q2 2 − q2q1 Assume for simplicity that c1 = c2 = c and denote θ = κ − c. In general, after 2k iterations we have R2k i = R2k i = [ k , rk ] where rk = (θ − k−1)/2 for k ≥ 1 k = (θ − rk )/2 for k ≥ 1 and 0 = 0 Solving the recurrence we obtain k = θ/3 − 1 4 k θ/3 rk = θ/3 + 1 4 k−1 θ/6 Hence, limk→∞ k = limk→∞ rk = θ/3 and thus (θ/3, θ/3) is the only rationalizable equilibrium. 47 Cournot Duopoly (cont.) G = (N, (Si)i∈N , (ui)i∈N) N = {1, 2} Si = [0, ∞) u1(q1, q2) = q1(κ − q1 − q2) − q1c1 = (κ − c1)q1 − q2 1 − q1q2 u2(q1, q2) = q2(κ − q2 − q1) − q2c2 = (κ − c2)q2 − q2 2 − q2q1 Assume for simplicity that c1 = c2 = c and denote θ = κ − c. Are qi = θ/3 Pareto optimal? NO! u1(θ/3, θ/3) = u2(θ/3, θ/3) = θ2 /9 but u1(θ/4, θ/4) = u2(θ/4, θ/4) = θ2 /8 48 IESDS vs Rationalizability in Pure Strategies Theorem 15 Assume that S is finite. Then for all k we have that Rk i ⊆ Dk i . That is, in particular, all rationalizable strategies survive IESDS. The opposite inclusion does not have to be true in pure strategies: X Y A 1, 1 1, 1 B 2, 1 0, 1 C 0, 1 2, 1 Recall that A is never best response but is strictly dominated by neither B, nor C. That is, A survives IESDS but is not rationalizable. 49 Proof of Theorem 15 By induction on k. For k = 0 we have that R0 i = Si = D0 i by definition. Now assume that Rk i ⊆ Dk i for some k ≥ 0. We prove that Rk+1 i ⊆ Dk+1 i by showing the following: For all s∗ i ∈ Rk i ⊆ Dk i : If s∗ i Dk+1 i , then s∗ i Rk+1 i Let us fix s∗ i ∈ Rk i such that s∗ i Dk+1 i . By definition, it suffices to prove that for every sk −i ∈ Rk −i there exists sk i ∈ Ri such that ui(sk i , sk −i) > ui(s∗ i , sk −i) (1) (In words, for every possible behavior of opponents of player i in Gk Rat , player i has a strictly better strategy than s∗ i in Gk Rat ) As s∗ i Dk+1 i , the strategy s∗ i must be strictly dominated in Gk DS by a strategy ¯si. That is for all sk −i ∈ Dk −i ⊇ Rk −i we have ui(¯si, sk −i) > ui(s∗ i , sk −i) (2) (Now note that if ¯si ∈ Rk i ⊆ Dk i , then we are done. Indeed, it suffices to put sk i := ¯si and the equation (1) will be satisfied for all sk −i ∈ Dk −i ⊇ Rk −i . However, it does not have to be the case that ¯si ∈ Rk i ) 50 Proof of Theorem 15 (cont.) Clearly, there is ≤ k such that ¯si ∈ Ri . (Note that ¯si does not have to strictly dominate s∗ i in GRat since R−i may be larger than Dk −i ) Recall that we need to find sk i ∈ Rk i for every given sk −i ∈ Rk −i so that the inequality (1) holds. (That is, sk i may be different for different sk −i ’s) Let us fix sk −i ∈ Rk −i ⊆ Dk −i . Let sk i ∈ Ri be a strategy maximizing ui(si, sk −i ) over all si ∈ Ri . In particular, we obtain the inequality (1): ui(sk i , sk −i) ≥ ui(¯si, sk −i) > ui(s∗ i , sk −i) Finally, note that sk i ∈ Rk i follows immediately from the fact that sk i is a best response to sk −i in all games GRat , . . . , Gk Rat (Indeed, even after removing some strategies (other than sk i and sk −i ), sk i remains a best resp. to sk −i ) 51 Pinning Down Beliefs – Nash Equilibria Criticism of previous approaches: Strictly dominant strategy equilibria often do not exist IESDS and rationalizability may not remove any strategies Typical example is Battle of Sexes: O F O 2, 1 0, 0 F 0, 0 1, 2 Here all strategies are equally reasonable according to the above concepts. But are all strategy profiles really equally reasonable? 52 Pinning Down Beliefs – Nash Equilibria O F O 2, 1 0, 0 F 0, 0 1, 2 Assume that each player has a belief about strategies of other players. By Claim 3, each player plays a best response to his beliefs. Is (O, F) as reasonable as (O, O) in this respect? Note that if player 1 believes that player 2 plays O, then playing O is reasonable, and if player 2 believes that player 1 plays F, then playing F is reasonable. But such beliefs cannot be correct together! (O, O) can be obtained as a profile where each player plays the best response to his belief and the beliefs are correct. 53 Nash Equilibrium Nash equilibrium can be defined as a set of beliefs (one for each player) and a strategy profile in which every player plays a best response to his belief and each strategy of each player is consistent with beliefs of his opponents. A usual definition is following: Definition 16 A pure-strategy profile s∗ = (s∗ 1 , . . . , s∗ n) ∈ S is a (pure) Nash equilibrium if s∗ i is a best response to s∗ −i for each i ∈ N, that is ui(s∗ i , s∗ −i) ≥ ui(si, s∗ −i) for all si ∈ Si and all i ∈ N Note that this definition is equivalent to the previous one in the sense that s∗ −i may be considered as the (consistent) belief of player i to which he plays a best response s∗ i 54 Nash Equilibria Examples In the Prisoner’s dilemma: C S C −5, −5 0, −20 S −20, 0 −1, −1 (C, C) is the only Nash equilibrium. In the Battle of Sexes: O F O 2, 1 0, 0 F 0, 0 1, 2 only (O, O) and (F, F) are Nash equilibria. In Cournot Duopoly, (θ/3, θ/3) is the only Nash equilibrium. (Best response relations: q1 = (θ − q2)/2 and q2 = (θ − q1)/2 are both satisfied only by q1 = q2 = θ/3) 55 Example: Stag Hunt Story: Two (in some versions more than two) hunters, players 1 and 2, can each choose to hunt stag (S) = a large tasty meal hare (H) = also tasty but small Hunting stag is much more demanding and forces of both players need to be joined (hare can be hunted individually) Strategy-form game model: N = {1, 2}, S1 = S2 = {S, H}, the payoff: S H S 5, 5 0, 3 H 3, 0 3, 3 Two NE: (S, S), and (H, H), where the former Pareto dominates the latter! Which one is more reasonable? 56 Example: Stag Hunt Strategy-form game model: N = {1, 2}, S1 = S2 = {S, H}, the payoff: S H S 5, 5 0, 3 H 3, 0 3, 3 Two NE: (S, S), and (H, H), where the former Pareto dominates the latter! Which one is more reasonable? If each player believes that the other one will go for hare, then (H, H) is a reasonable outcome ⇒ a society of individualists who do not cooperate at all. If each player believes that the other will cooperate, then this anticipation is self-fulfilling and results in what can be called a cooperative society. This is supposed to explain that in real world there are societies that have similar endowments, access to technology and physical environment but have very different achievements, all because of self-fulfilling beliefs (or norms of behavior). 57 Example: Stag Hunt Strategy-form game model: N = {1, 2}, S1 = S2 = {S, H}, the payoff: S H S 5, 5 0, 3 H 3, 0 3, 3 Two NE: (S, S), and (H, H), where the former Pareto dominates the latter! Which one is more reasonable? Another point of view: (H, H) is less risky Minimum secured by playing S is 0 as opposed to 3 by playing H (We will get to this minimax principle later) So it seems to be rational to expect (H, H) (?) 58 Nash Equilibria vs Previous Concepts Theorem 17 1. If s∗ is a strictly dominant strategy equilibrium, then it is the unique Nash equilibrium. 2. Each Nash equilibrium is rationalizable and survives IESDS. 3. If S is finite, neither rationalizability, nor IESDS creates new Nash equilibria. Proof: Homework! Corollary 18 Assume that S is finite. If rationalizability or IESDS result in a unique strategy profile, then this profile is a Nash equilibrium. 59 Interpretations of Nash Equilibria Except the two definitions, usual interpretations are following: When the goal is to give advice to all of the players in a game (i.e., to advise each player what strategy to choose), any advice that was not an equilibrium would have the unsettling property that there would always be some player for whom the advice was bad, in the sense that, if all other players followed the parts of the advice directed to them, it would be better for some player to do differently than he was advised. If the advice is an equilibrium, however, this will not be the case, because the advice to each player is the best response to the advice given to the other players. When the goal is prediction rather than prescription, a Nash equilibrium can also be interpreted as a potential stable point of a dynamic adjustment process in which individuals adjust their behavior to that of the other players in the game, searching for strategy choices that will give them better results. 60 Static Games of Complete Information Mixed Strategies 61 Let’s Mix It As pointed out before, neither of the solution concepts has to exist in pure strategies Example: Rock-Paper-sCissors R P C R 0, 0 −1, 1 1, −1 P 1, −1 0, 0 −1, 1 C −1, 1 1, −1 0, 0 There are no strictly dominant pure strategies No strategy is strictly dominated (IESDS removes nothing) Each strategy is a best response to some strategy of the opponent (rationalizability removes nothing) No pure Nash equilibria: No pure strategy profile allows each player to play a best response to the strategy of the other player How to solve this? Let the players randomize their choice of pure strategies .... 62 Probability Distributions Definition 19 Let A be a finite set. A probability distribution over A is a function σ : A → [0, 1] such that a∈A σ(a) = 1. We denote by ∆(A) the set of all probability distributions over A. We denote by supp(σ) the support of σ, that is the set of all a ∈ A satisfying σ(a) > 0. Example 20 Consider A = {a, b, c} and a function σ : A → [0, 1] such that σ(a) = 1 4 , σ(b) = 3 4 , and σ(c) = 0. Then σ ∈ ∆(A) and supp(σ) = {a, b}. 63 Mixed Strategies Let us fix a strategic-form game G = (N, (Si)i∈N , (ui)i∈N). From now on, assume that all Si are finite! Definition 21 A mixed strategy of player i is a probability distribution σ ∈ ∆(Si) over Si. We denote by Σi = ∆(Si) the set of all mixed strategies of player i. We define Σ := Σ1 × · · · × Σn, the set of all mixed strategy profiles. Recall that by Σ−i we denote the set Σ1 × · · · Σi−1 × Σi+1 × · · · × Σn Elements of Σ−i are denoted by σ−i = (σ1, . . . , σi−1, σi+1, . . . , σn). We identify each si ∈ Si with a mixed strategy σ that assigns probability one to si (and zero to other pure strategies). For example, in rock-paper-scissors, the pure strategy R corresponds to σi which satisfies σi(X) =    1 X = R 0 otherwise 64 Mixed Strategies Sometimes we assume Si = {1, . . . , mi}, here mi ∈ {1, 2, . . .}, for all i ∈ N. Then every mixed strategy σi is a vector σi = (σi(1), . . . , σi(mi)) ∈ [0, 1]mi so that σi(1) + · · · + σi(mi) = 1 65 Mixed Strategy Profiles Let σ = (σ1, . . . , σn) be a mixed strategy profile. Intuitively, we assume that each player i randomly chooses his pure strategy according to σi and independently of his opponents. Thus for s = (s1, . . . , sn) ∈ S = S1 × · · · × Sn we have that σ(s) := n i=1 σi(si) is the probability that the players choose the pure strategy profile s according to the mixed strategy profile σ, and σ−i(s−i) := n k i σk (sk ) is the probability that the opponents of player i choose s−i ∈ S−i when they play according to the mixed strategy profile σ−i ∈ Σ−i. (We abuse notation a bit here: σ denotes two things, a vector of mixed strategies as well as a probability distribution on S (the same for σ−i) 66 Mixed Strategies – Example R P C R 0, 0 −1, 1 1, −1 P 1, −1 0, 0 −1, 1 C −1, 1 1, −1 0, 0 An example of a mixed strategy σ1: σ1(R) = 1 2 , σ1(P) = 1 3 , σ1(C) = 1 6 . Sometimes we write σ1 as (1 2 (R), 1 3 (P), 1 6 (C)), or only (1 2 , 1 3 , 1 6 ) if the order of pure strategies is fixed. Consider a mixed strategy profile (σ1, σ2) where σ1 = (1 2 (R), 1 3 (P), 1 6 (C)) and σ2 = (1 3 (R), 2 3 (P), 0(C)). Then the probability σ(R, P) that the pure strategy profile (R, P) will be chosen by players playing the mixed profile (σ1, σ2) is σ1(R) · σ2(P) = 1 2 · 2 3 = 1 3 67 Expected Payoff ... but now what is the suitable notion of payoff? Definition 22 The expected payoff of player i under a mixed strategy profile σ ∈ Σ is ui(σ) := s∈S σ(s)ui(s)  = s∈S n k=1 σk (sk )ui(s)   I.e., it is the "weighted average" of what player i wins under each pure strategy profile s, weighted by the probability of that profile. Assumption: Every rational player strives to maximize his own expected payoff. (This assumption is not always completely convincing ...) 68 Expected Payoff – Example Matching Pennies: H T H 1, −1 −1, 1 T −1, 1 1, −1 Each player secretly turns a penny to heads or tails, and then they reveal their choices simultaneously. If the pennies match, player 1 (row) wins, if they do not match, player 2 (column) wins. Consider σ1 = (1 3 (H), 2 3 (T)) and σ2 = (1 4 (H), 3 4 (T)) u1(σ1, σ2) = (X,Y)∈{H,T}2 σ1(X)σ2(Y)u1(X, Y) = 1 3 1 4 1 + 1 3 3 4 (−1) + 2 3 1 4 (−1) + 2 3 3 4 1 = 1 6 u2(σ1, σ2) = (X,Y)∈{H,T}2 σ1(X)σ2(Y)u2(X, Y) = 1 3 1 4 (−1) + 1 3 3 4 1 + 2 3 1 4 1 + 2 3 3 4 (−1) = − 1 6 69 "Decomposition" of Expected Payoff Consider the matching pennies example from the previous slide: H T H 1, −1 −1, 1 T −1, 1 1, −1 together with some mixed strategies σ1 and σ2. We prove the following important property of the expected payoff: u1(σ1, σ2) = X∈{H,T} σ1(X)u1(X, σ2) An intuition behind this equality is following: u1(σ1, σ2) is the expected payoff of player 1 in the following experiment: Both players simultaneously and independently choose their pure strategies X, Y according to σ1, σ2, resp., and then player 1 collects his payoff u1(X, Y). X∈{H,T} σ1(X)u1(X, σ2) is the expected payoff of player 1 in the following: Player 1 chooses his pure strategy X and then uses it against the mixed strategy σ2 of player 2. Then player 2 chooses Y according to σ2 independently of X, and player 1 collects the payoff u1(X, Y). As Y does not depend on X in neither experiment, we obtain the above equality of expected payoffs. 70 "Decomposition" of Expected Payoff Consider the matching pennies example from the previous slide: H T H 1, −1 −1, 1 T −1, 1 1, −1 together with some mixed strategies σ1 and σ2. A formal proof is straightforward: u1(σ1, σ2) = (X,Y)∈{H,T}2 σ1(X)σ2(Y)u1(X, Y) = X∈{H,T} Y∈{H,T} σ1(X)σ2(Y)u1(X, Y) = X∈{H,T} σ1(X) Y∈{H,T} σ2(Y)u1(X, Y) = X∈{H,T} σ1(X)u1(X, σ2) (In the last equality we used the fact that X is identified with a mixed strategy assigning one to X.) 71 "Decomposition" of Expected Payoff Consider the matching pennies example from the previous slide: H T H 1, −1 −1, 1 T −1, 1 1, −1 together with some mixed strategies σ1 and σ2. Similarly, u1(σ1, σ2) = (X,Y)∈{H,T}2 σ1(X)σ2(Y)u1(X, Y) = X∈{H,T} Y∈{H,T} σ1(X)σ2(Y)u1(X, Y) = Y∈{H,T} X∈{H,T} σ1(X)σ2(Y)u1(X, Y) = Y∈{H,T} σ2(Y) X∈{H,T} σ1(X)u1(X, Y) = Y∈{H,T} σ2(Y)u1(σ1, Y) 72 Expected Payoff – "Decomposition" in General Lemma 23 For every mixed strategy profile σ ∈ Σ and every k ∈ N we have ui(σ) = sk ∈Sk σk (sk ) · ui(sk , σ−k ) = s−k ∈S−k σ−k (s−k ) · ui(σk , s−k ) Lemma 23 immediately implies that each ui(σ) is affine in each σk (sk ), Also, ui(σ) = ui(σ1, . . . , σn) is linear in each σk . Indeed, assuming w.l.o.g. that Sk = {1, . . . , mk }, ui(σ) = sk ∈Sk σk (sk ) · ui(sk , σ−k ) = mk =1 σk ( ) · ui( , σ−k ) is the scalar product of the vector σk = (σk (1), . . . , σk (mk )) with the vector (ui(1, σ−k ), . . . , ui(mk , σ−k )). 73 Expected Payoff – Pure Strategy Bounds Before proving Lemma 23, we prove the following simple corollary. Corollary 24 For all i, k ∈ N and σ ∈ Σ we have that minsk ∈Sk ui(sk , σ−k ) ≤ ui(σ) ≤ maxsk ∈Sk ui(sk , σ−k ) mins−k ∈S−k ui(σk , s−k ) ≤ ui(σ) ≤ maxs−k ∈S−k ui(σk , s−k ) Proof. We prove ui(σ) ≤ maxsk ∈Sk ui(sk , σ−k ) the rest is similar. Define B := maxsk ∈Sk ui(sk , σ−k ). Then ui(σ) = sk ∈Sk σk (sk ) · ui(sk , σ−k ) = sk ∈Sk σk (sk ) · (B − (B − ui(sk , σ−k ))) ≤ sk ∈Sk σk (sk ) · B = B 74 Proof of Lemma 23 ui(σ) = s∈S σ(s)ui(s) = s∈S n =1 σ (s )ui(s) = s∈S σk (sk ) n k σ (s )ui(s) = sk ∈Sk s−k ∈S−k σk (sk ) n k σ (s )ui(sk , s−k ) = sk ∈Sk s−k ∈S−k σk (sk )σ−k (s−k )ui(sk , s−k ) 75 Proof of Lemma 23 (cont.) The first equality: ui(σ) = sk ∈Sk s−k ∈S−k σk (sk )σ−k (s−k )ui(sk , s−k ) = sk ∈Sk σk (sk ) s−k ∈S−k σ−k (s−k )ui(sk , s−k ) = s−k ∈S−k σk (sk )ui(sk , σ−k ) 76 Proof of Lemma 23 (cont.) The second equality: ui(σ) = sk ∈Sk s−k ∈S−k σk (sk )σ−k (s−k )ui(sk , s−k ) = s−k ∈S−k sk ∈Sk σk (sk )σ−k (s−k )ui(sk , s−k ) = s−k ∈S−k σ−k (s−k ) sk ∈Sk σk (sk )ui(sk , s−k ) = s−k ∈S−k σ−k (s−k )ui(σk , s−k ) 77 Solution Concepts We revisit the following solution concepts in mixed strategies: strict dominant strategy equilibrium IESDS equilibrium rationalizable equilibria Nash equilibria From now on, when I say a strategy I implicitly mean a mixed strategy. In order to deal with efficiency issues we assume that the size of the game G is defined by |G| := |N| + i∈N |Si| + i∈N |ui| where |ui| = s∈S |ui(s)| and |ui(s)| is the length of a binary encoding of ui(s) (we assume that rational numbers are encoded as quotients of two binary integers) Note that, in particular, |G| > |S|. 78 Strict Dominance in Mixed Strategies Definition 25 Let σi, σi ∈ Σi be (mixed) strategies of player i. Then σi is strictly dominated by σi (write σi σi) if ui(σi, σ−i) > ui(σi , σ−i) for all σ−i ∈ Σ−i Example 26 X Y A 3 0 B 0 3 C 1 1 Is there a strictly dominated strategy? Question: Is there a game with at least one strictly dominated strategy but without strictly dominated pure strategies? 79 Strictly Dominant Strategy Equilibrium Definition 27 σi ∈ Σi is strictly dominant if every other mixed strategy of player i is strictly dominated by σi. Definition 28 A strategy profile σ ∈ Σ is a strictly dominant strategy equilibrium if σi ∈ Σi is strictly dominant for all i ∈ N. Proposition 2 If the strictly dominant strategy equilibrium exists, it is unique, all its strategies are pure, and rational players will play it. Proof. Let σ∗ = (σ∗ 1 , . . . , σ∗ n) ∈ Σi be the strictly dominant strategy equilibrium. By Corollary 24, for every i ∈ N and σ−i ∈ Σ−i, there must exist si ∈ Si such that ui(σ∗ i , σ−i) ≤ ui(si, σ−i). But then σ∗ i = si since σ∗ i is strictly dominant. 80 Computing Strictly Dominant Strategy Equilibrium How to decide whether there is a strictly dominant strategy equilibrium s = (s1, . . . , sn) ∈ S ? I.e. whether for a given si ∈ Si, all σi ∈ Σi {si} and all σ−i ∈ Σ−i : ui(si, σ−i) > ui(σi, σ−i) There are some serious issues here: Obviously there are uncountably many possible σi and σ−i. ui(σi, σ−i) is nonlinear, and for more that two players even ui(si, σ−i) is nonlinear in probabilities assigned to pure strategies. First, we prove the following useful proposition using Lemma 23: Lemma 29 σi strictly dominates σi iff for all pure strategy profiles s−i ∈ S−i: ui(σi, s−i) > ui(σi , s−i) Proof: Simple application of the second equality from Lemma 23. In other words, it suffices to check the strict dominance only with respect to all pure profiles of opponents. 81 Computing Strictly Dominant Strategy Equilibrium How to decide whether for a given si ∈ Si, all σi ∈ Σi {si} and all s−i ∈ S−i we have ui(si, s−i) > ui(σi, s−i) Lemma 30 ui(si, s−i) > ui(σi, s−i) for all σi ∈ Σi {si} and all s−i ∈ S−i iff ui(si, s−i) > ui(si , s−i) for all si ∈ Si {si} and all s−i ∈ S−i. Proof: Simple application of the first equality from Lemma 23. Thus it suffices to check whether ui(si, s−i) > ui(si , s−i) for all si ∈ Si and all s−i ∈ S−i. This can easily be done in time polynomial w.r.t. |G|. 82 IESDS in Mixed Strategies Define a sequence D0 i , D1 i , D2 i , . . . of strategy sets of player i. (Denote by Gk DS the game obtained from G by restricting the pure strategy sets to Dk i , i ∈ N.) 1. Initialize k = 0 and D0 i = Si for each i ∈ N. 2. For all players i ∈ N: Let Dk+1 i be the set of all pure strategies of Dk i that are not strictly dominated in Gk DS by mixed strategies. 3. Let k := k + 1 and go to 2. We say that si ∈ Si survives IESDS if si ∈ Dk i for all k = 0, 1, 2, . . . Definition 31 A strategy profile s = (s1, . . . , sn) ∈ S is an IESDS equilibrium if each si survives IESDS. 83 IESDS – Algorithm Note that in step 2 it is not sufficient to consider pure strategies. Consider the following zero sum game: X Y A 3 0 B 0 3 C 1 1 C is strictly dominated by (σ1(A), σ1(B), σ1(C)) = (1 2 , 1 2 , 0) but no strategy is strictly dominated in pure strategies. However, there are uncountably many mixed strategies that may dominate a given pure strategy ... Recall ui(σi, σ−i) is linear in σi. So to decide strict dominance, we use linear programming ... 84 Intermezzo: Linear Programming Linear programming is a technique for optimization of a linear objective function, subject to linear (non-strict) inequality constraints. Formally, a linear program in so called canonical form looks like this: maximize m j=1 cjxj subject to m j=1 aijxj ≤ bi 1 ≤ i ≤ n xj ≥ 0 1 ≤ j ≤ m (objective function) (constraints) Here aij, bk and cj are real numbers and xj’s are real variables. A feasible solution is an assignment of real numbers to the variables xj, 1 ≤ j ≤ m, so that the constraints are satisfied. An optimal solution is a feasible solution which maximizes the objective function m j=1 cjxj. 85 Intermezzo: Complexity of Linear Programming We assume that coefficients aij, bk and cj are encoded in binary (more precisely, as fractions of two integers encoded in binary). Theorem 32 (Khachiyan, Doklady Akademii Nauk SSSR, 1979) There is an algorithm which for any linear program computes an optimal solution in polynomial time. The algorithm uses so called ellipsoid method. In practice, the Khachiyan’s is not used. Usually simplex algorithm is used even though its theoretical complexity is exponential. There is also a polynomial time algorithm (by Karmarkar) which has better complexity upper bounds than the Khachiyan’s and sometimes works even better than the simplex. There exist several advanced linear programming solvers (usually parts of larger optimization packages) implementing various heuristics for solving large scale problems, sensitivity analysis, etc. For more info see http://en.wikipedia.org/wiki/Linear_programming#Solvers_and_scripting_.28programming.29_languages 86 IESDS Algorithm – Strict Dominance Step So how do we use linear programming to decide strict dominance in step 2 of IESDS procedure? I.e. whether for a given si there exists σi such that for all σ−i we have ui(σi, σ−i) > ui(si, σ−i) Recall that by Lemma 29 we have that σi is strictly dominates σi iff for all pure strategy profiles s−i ∈ S−i: ui(σi, s−i) > ui(σi , s−i) In other words, it suffices to check the strict dominance only with respect to all pure profiles of opponents. 87 IESDS Algorithm – Strict Dominance Step Recall that ui(σi, s−i) = si ∈Si σi(si )ui(si , s−i). So to decide whether si ∈ Si is strictly dominated by some mixed strategy σi, it suffices to solve the following system: si ∈Si xsi · ui(si , s−i) > ui(si, s−i) s−i ∈ S−i xsi ≥ 0 si ∈ Si si ∈Si xsi = 1 (Here each variable xsi corresponds to the probability σi(si ) assigned by the strictly dominant strategy σi to si ) Unfortunately, this is a "strict linear program" ... How to deal with the strict inequality? 88 IESDS Algorithm – Complexity Introduce a new variable y to be maximized under the following constraints: si ∈Si xsi · ui(si , s−i) ≥ ui(si, s−i) + y s−i ∈ S−i xsi ≥ 0 si ∈ Si si ∈Si xsi = 1 y ≥ 0 Now si is strictly dominated iff a solution maximizing y satisfies y > 0 The size of the above program is polynomial in |G|. So the step 2 of IESDS can be executed in polynomial time. As every iteration of IESDS removes at least one pure strategy, IESDS runs in time polynomial in |G|. 89 IESDS in Mixed Strategie – Example X Y A 3 0 B 0 3 C 1 1 Let us have a look at the first iteration of IESDS. Observe that A, B are not strictly dominated by any mixed strategy. Let us construct the linear program for deciding whether C is strictly dominated: The program maximizes y under the following constraints: 3xA + 0xB + xC ≥ 1 + y 0xA + 3xB + xC ≥ 1 + y xA , xB , xC ≥ 0 xA + xB + xC = 1 y ≥ 0 The maximum y = 1 2 is attained at xA = 1 2 and xB = 1 2 . 90 Best Response Definition 33 A strategy σi ∈ Σi of player i is a best response to a strategy profile σ−i ∈ Σ−i of his opponents if ui(σi, σ−i) ≥ ui(σi , σ−i) for all σi ∈ Σi We denote by BRi(σ−i) ⊆ Σi the set of all best responses of player i to the strategy profile of opponents σ−i ∈ Σ−i. 91 Best Response – Example Consider a game with the following payoffs of player 1: X Y A 2 0 B 0 2 C 1 1 Player 1 (row) plays σ1 = (a(A), b(B), c(C)). Player 2 (column) plays (q(X), (1 − q)(Y)) (we write just q). Compute BR1(q). 92 Rationalizability in Mixed Strategies (Two Players) For simplicity, we temporarily switch to two-player setting N = {1, 2}. Definition 34 A (mixed) belief of player i ∈ {1, 2} is a mixed strategy σ−i of his opponent. (A general definition works with so called correlated beliefs that are arbitrary distributions on S−i, the notion of the expected payoff needs to be adjusted, we are not going in this direction ....) Assumption: Any rational player with a belief σ−i always plays a best response to σ−i. Definition 35 A strategy σi ∈ Σi of player i ∈ {1, 2} is never best response if it is not a best response to any belief σ−i. No rational player plays a strategy that is never best response. 93 Rationalizability in Mixed Strategies (Two Players) Define a sequence R0 i , R1 i , R2 i , . . . of strategy sets of player i. (Denote by Gk Rat the game obtained from G by restricting the pure strategy sets to Rk i , i ∈ N.) 1. Initialize k = 0 and R0 i = Si for each i ∈ N. 2. For all players i ∈ N: Let Rk+1 i be the set of all strategies of Rk i that are best responses to some (mixed) beliefs in Gk Rat . 3. Let k := k + 1 and go to 2. We say that si ∈ Si is rationalizable if si ∈ Rk i for all k = 0, 1, 2, . . . Definition 36 A strategy profile s = (s1, . . . , sn) ∈ S is a rationalizable equilibrium if each si is rationalizable. 94 Rationalizability vs IESDS (Two Players) X Y A 3 0 B 0 3 C 1 1 Player 1 (row) plays σ1 = (a(A), b(B), c(C)) player 2 (column) plays (q(X), (1 − q)(Y)) (we write just q) What strategies of player 1 are never best responses? What strategies of player 1 are strictly dominated? Observation: The set of strictly dominated strategies coincides with the set of never best responses! ... and this holds in general for two player games: Theorem 37 Assume N = {1, 2}. A pure strategy si is never best response to any belief σ−i ∈ Σ−i iff si is strictly dominated by a strategy σi ∈ Σi. It follows that a strategy of Si survives IESDS iff it is rationalizable. (The theorem is true also for an arbitrary number of players but correlated beliefs need to be used.) 95 Mixed Nash Equilibrium Definition 38 A mixed-strategy profile σ∗ = (σ∗ 1 , . . . , σ∗ n) ∈ Σ is a (mixed) Nash equilibrium if σ∗ i is a best response to σ∗ −i for each i ∈ N, that is ui(σ∗ i , σ∗ −i) ≥ ui(σi, σ∗ −i) for all σi ∈ Σi and all i ∈ N An interpretation: each σ∗ −i can be seen as a belief of player i against which he plays a best response σ∗ i . Given a mixed strategy profile of opponents σ−i ∈ Σ−i, we denote by BRi(σ−i) the set of all σi ∈ Σi that are best responses to σ−i. Then σ∗ is a Nash equilibrium iff σ∗ i ∈ BRi(σ∗ −i ) for all i ∈ N. Theorem 39 (Nash 1950) Every finite game in strategic form has a Nash equilibrium. This is THE fundamental theorem of game theory. 96 Example: Matching Pennies H T H 1, −1 −1, 1 T −1, 1 1, −1 Player 1 (row) plays (p(H), (1 − p)(T)) (we write just p) and player 2 (column) plays (q(H), (1 − q)(T)) (we write q). Compute all Nash equilibria. What are the expected payoffs of playing pure strategies for player 1? v1(H, q) = 2q − 1 and v1(T, q) = 1 − 2q Then v1(p, q) = pv1(H, q) + (1 − p)v1(T, q) = p(2q − 1) + (1 − p)(1 − 2q). We obtain the best-response correspondence BR1: BR1(q) =    p = 0 if q < 1 2 p ∈ [0, 1] if q = 1 2 p = 1 if q > 1 2 97 Example: Matching Pennies H T H 1, −1 −1, 1 T −1, 1 1, −1 Player 1 (row) plays (p(H), (1 − p)(T)) (we write just p) and player 2 (column) plays (q(H), (1 − q)(T)) (we write q). Compute all Nash equilibria. Similarly for player 2 : v2(p, H) = 1 − 2p and v1(p, T) = 2p − 1 We obtain best-response relation BR2: BR2(p) =    q = 1 if p < 1 2 q ∈ [0, 1] if p = 1 2 q = 0 if p > 1 2 The only "intersection" of BR1 and BR2 is the only Nash equilibrium σ1 = σ2 = (1 2 , 1 2 ). 98 Computing Mixed Nash Equilibria Lemma 40 σ∗ = (σ∗ 1 , . . . , σ∗ n) ∈ Σ is a Nash equilibrium iff there exist w1, . . . , wn ∈ R such that the following holds: For all i ∈ N and all si ∈ supp(σ∗ i ) we have ui(si, σ∗ −i ) = wi. For all i ∈ N and all si supp(σ∗ i ) we have ui(si, σ∗ −i ) ≤ wi. Here, the right hand side implies ui(σ∗ ) = wi. Proof. The fact that the right hand side implies ui(σ∗ ) = wi follows immediately from Lemma 23: ui(σ∗ ) = si ∈Si σ∗ (si)ui(si, σ∗ −i) = si ∈supp(σ∗ i ) σ∗ (si)ui(si, σ∗ −i) = si ∈supp(σ∗ i ) σ∗ (si)wi = wi si ∈supp(σ∗ i ) σ∗ (si) = wi 99 Computing Mixed Nash Equilibria Lemma 41 σ∗ = (σ∗ 1 , . . . , σ∗ n) ∈ Σ is a Nash equilibrium iff there exist w1, . . . , wn ∈ R such that the following holds: For all i ∈ N and all si ∈ supp(σ∗ i ) we have ui(si, σ∗ −i ) = wi. For all i ∈ N and all si supp(σ∗ i ) we have ui(si, σ∗ −i ) ≤ wi. Here, the right hand side implies ui(σ∗ ) = wi. Proof. (Cont.) "⇐": Use the first equality of Lemma 23 to obtain for every i ∈ N and every σi ∈ Σi ui(σi , σ∗ −i) = si ∈Si σi (si)ui(si, σ∗ −i) ≤ si ∈Si σi (si)ui(σ∗ ) = ui(σ∗ ) Thus σ∗ is a Nash equilibrium. 100 Computing Mixed Nash Equilibria Lemma 42 σ∗ = (σ∗ 1 , . . . , σ∗ n) ∈ Σ is a Nash equilibrium iff there exist w1, . . . , wn ∈ R such that the following holds: For all i ∈ N and all si ∈ supp(σ∗ i ) we have ui(si, σ∗ −i ) = wi. For all i ∈ N and all si supp(σ∗ i ) we have ui(si, σ∗ −i ) ≤ wi. Here, the right hand side implies ui(σ∗ ) = wi. Proof (Cont.) Idea for "⇒": Let wi := ui(σ∗ ). Clearly, every i ∈ N and si ∈ Si satisfy ui(si, σ∗ −i ) ≤ ui(σ∗ ) = wi. By Corollary 24, there is at least one si ∈ supp(σ∗ i ) satisfying ui(si, σ∗ −i ) = ui(σ∗ ) = wi. Now if there is si ∈ supp(σ∗ i ) such that ui(si , σ∗ −i) < ui(σ∗ ) (= ui(si, σ∗ −i)) then increasing the probability σ∗ i (si) and decreasing (in proportion) σ∗ i (si ) strictly increases of ui(σ∗ ), a contradiction with σ∗ being NE. 101 Example: Matching Pennies H T H 1, −1 −1, 1 T −1, 1 1, −1 Player 1 (row) plays (p(H), (1 − p)(T)) (we write just p) and player 2 (column) plays (q(H), (1 − q)(T)) (we write q). Compute all Nash equilibria. There are no pure strategy equilibria. There are no equilibria where only player 1 randomizes: Indeed, assume that (p, H) is such an equilibrium. Then by Lemma 42, 1 = u1(H, H) = u1(T, H) = −1 a contradiction. Also, (p, T) cannot be an equilibrium. Similarly, there is no NE where only player 2 randomizes. 102 Example: Matching Pennies H T H 1, −1 −1, 1 T −1, 1 1, −1 Player 1 (row) plays (p(H), (1 − p)(T)) (we write just p) and player 2 (column) plays (q(H), (1 − q)(T)) (we write q). Compute all Nash equilibria. Assume that both players randomize, i.e., p, q ∈ (0, 1). The expected payoffs of playing pure strategies for player 1: v1(H, q) = 2q − 1 and v1(T, q) = 1 − 2q Similarly for player 2 : v2(p, H) = 1 − 2p and v1(p, T) = 2p − 1 By Lemma 42, Nash equilibria must satisfy: 2q − 1 = 1 − 2q and 1 − 2p = 2p − 1 That is p = q = 1 2 is the only Nash equilibrium. 103 Example: Battle of Sexes O F O 2, 1 0, 0 F 0, 0 1, 2 Player 1 (row) plays (p(O), (1 − p)(F)) (we write just p) and player 2 (column) plays (q(O), (1 − q)(F)) (we write q). Compute all Nash equilibria. There are two pure strategy equilibria (2, 1) and (1, 2), no Nash equilibrium where only one player randomizes. Now assume that player 1 (row) plays (p(H), (1 − p)(T)) (we write just p) and player 2 (column) plays (q(H), (1 − q)(T)) (we write q) where p, q ∈ (0, 1). By Lemma 42, any Nash equilibrium must satisfy: 2q = 1 − q and p = 2(1 − p) This holds only for q = 1 3 and p = 2 3 . 104 An Algorithm? What did we do in the previous examples? We went through all support combinations for both players. (pure, one player mixing, both mixing) For each pair of supports we tried to find equilibria in strategies with these supports. (in Battle of Sexes: two pure, no equilibrium with just one player mixing, one equilibrium when both mixing) Whenever one of the supports was non-singleton, we reduced computation of Nash equilibria to linear equations. 105 Support Enumeration (Idea) Recall Lemma 42: σ∗ = (σ∗ 1 , . . . , σ∗ n) ∈ Σ is a Nash equilibrium iff there exist w1, . . . , wn ∈ R such that the following holds: For all i ∈ N and all si ∈ supp(σ∗ i ) we have ui(si, σ∗ −i ) = wi. For all i ∈ N and all si supp(σ∗ i ) we have ui(si, σ∗ −i ) ≤ wi. Suppose that we somehow know the supports supp(σ∗ 1 ), . . . , supp(σ∗ n) for some Nash equilibrium σ∗ 1 , . . . , σ∗ n (which itself is unknown to us). Now we may consider all σ∗ i (si)’s and all wi’s as variables and use the above conditions to design a system of inequalities capturing Nash equilibria with the given support sets supp(σ∗ 1 ), . . . , supp(σ∗ n). 106 Support Enumeration To simplify notation, assume that for every i we have Si = {1, . . . , mi}. Then σi(j) is the probability of the pure strategy j in the mixed strategy σi. Fix supports suppi ⊆ Si for every i ∈ N and consider the following system of constraints with variables σ1(1), . . . , σ1(m1), . . . , σn(1), . . . , σn(mn), w1, . . . , wn: 1. For all i ∈ N and all k ∈ suppi we have (ui(k, σ−i) = ) s∈S∧si =k   j i σj(sj)   ui(s) = wi 2. For all i ∈ N and all k suppi we have (ui(k, σ−i) = ) s∈S∧si =k   j i σj(sj)   ui(s) ≤ wi 3. For all i ∈ N: σi(1) + · · · + σi(mi) = 1. 4. For all i ∈ N and all k ∈ suppi: σi(k) ≥ 0. 5. For all i ∈ N and all k suppi: σi(k) = 0. 107 Support Enumeration Consider the system of constraints from the previous slide. The following lemma follows immediately from Lemma 42. Lemma 43 Let σ∗ ∈ Σ be a strategy profile. If σ∗ is a Nash equilibrium and supp(σ∗ i ) = suppi for all i ∈ N, then assigning σi(k) := σ∗ i (k) and wi := ui(σ∗ ) solves the system. If σi(k) := σ∗ i (k) and wi := ui(σ∗ ) solves the system, then σ∗ is a Nash equilibrium with supp(σ∗ i ) ⊆ suppi for all i ∈ N. 108 Support Enumeration (Two Players) The constraints are non-linear in general, but linear for two player games! Let us stick to two players. How to find supp1 and supp2? ... Just guess! Input: A two-player strategic-form game G with strategy sets S1 = {1, . . . , m1} and S2 = {1, . . . , m2} and rational payoffs u1, u2. Output: A Nash equilibrium σ∗ . Algorithm: For all possible supp1 ⊆ S1 and supp2 ⊆ S2: Check if the corresponding system of linear constraints (from the previous slide) has a feasible solution σ∗ , w∗ 1 , . . . , w∗ n. If so, STOP: the feasible solution σ∗ is a Nash equilibrium satisfying ui(σ∗ ) = w∗ i . Question: How many possible subsets supp1, supp2 are there to try? Answer: 2(m1+m2) So, unfortunately, the algorithm requires worst-case exponential time. 109 Remarks on Support Enumeration The algorithm combined with Theorem 39 and properties of linear programming imply that every finite two-player game has a rational Nash equilibrium (furthermore, the rational numbers have polynomial representation in binary). The algorithm can be used to compute all Nash equilibria. (There are algorithms for computing (a finite representation of) a set of all feasible solutions of a given linear constraint system.) The algorithm can be used to compute "good" equilibria. For example, to find a Nash equilibrium maximizing the sum of all expected payoffs (the "social welfare") it suffices to solve the system of constraints while maximizing w1 + · · · + wn. More precisely, the algorithm can be modified as follows: Initialize W := −∞ (W stores the current maximum welfare) For all possible supp1 ⊆ S1 and supp2 ⊆ S2: Find the maximum value max( wi) of w1 + · · · + wn so that the constraints are satisfiable (using linear programming). Put W := max{W, max( wi)}. Return W. 110 Remarks on Support Enumeration (Cont.) Similar trick works for any notion of "good" NE that can be expressed using a linear objective function and (additional) linear constraints in variables σi(j) and wi. (e.g., maximize payoff of player 1, minimize payoff of player 2 and keep probability of playing the strategy 1 below 1/2, etc.) 111 Complexity Results – (Two Players) Theorem 44 All the following problems are NP-complete: Given a two-player game in strategic form, does it have 1. a NE in which player 1 has utility at least a given amount v ? 2. a NE in which the sum of expected payoffs of the two players is at least a given amount v ? 3. a NE with a support of size greater than a given number? 4. a NE whose support contains a given strategy s ? 5. a NE whose support does not contain a given strategy s ? 6. .... Membership to NP follows from the support enumeration: For example, for 1., it suffices to guess supports supp1, supp2 and add w1 ≥ v to the constraints; the resulting NE σ∗ satisfies u1(σ∗ ) ≥ v. 112 Complexity Results (Two Players) Theorem 45 All the following problems are NP-complete: Given a two-player game in strategic form, does it have 1. a NE in which player 1 has utility at least a given amount v ? 2. a NE in which the sum of expected payoffs of the two players is at least a given amount v ? 3. a NE with a support of size greater than a given number? 4. a NE whose support contains a given strategy s ? 5. a NE whose support does not contain a given strategy s ? 6. .... NP-hardness can be proved using reduction from SAT (The reduction is not difficult but we are not going into it. It is presented in "New Complexity Results about Nash Equilibria" by V. Conitzer and T. Sandholm (pages 6–8) ) 113 The Reduction (It’s Short and Sweet) 114 ... But What is The Exact Complexity of Computing Nash Equilibria in Two Player Games? Let us concentrate on the problem of computing one Nash equilibrium (sometimes called the sample equilibrium problem). As the class NP consists of decision problems, it cannot be directly used to characterize complexity of the sample equilibrium problem. We use complexity classes of function problems such as FP, FNP, etc. The support enumeration gives a deterministic algorithm which runs in exponential time. Can we do better? In what follows we show that the sample equilibrium problem can be solved in polynomial time for zero-sum two-player games, (Using a beautiful characterization of all Nash equilibria) the sample equilibrium problem belongs to the complexity class PPAD (which is a subclass of FNP) for two-player games. (... to be defined later) 115 MaxMin Is there a better characterization of Nash equilibria than Lemma 42 ? Definition 46 σ∗ i ∈ Σi is a maxmin strategy of player i if σ∗ i ∈ argmax σi ∈Σi min σ−i ∈Σ−i ui(σi, σ−i) (Intuitively, a maxmin strategy σ∗ 1 maximizes player 1’s worst-case payoff in the situation where player 2 strives to cause the greatest harm to player 1.) (Since ui is continuous and Σ−i compact, minσ−i ∈Σ−i ui(σi, σ−i) is well defined and continuous on Σi, which implies that there is at least one maxmin strategy.) 116 MaxMin Lemma 47 σ∗ i is maxmin iff σ∗ i ∈ argmax σi ∈Σi min s−i ∈S−i ui(σi, s−i) Proof. By Corollary 24, for every σ ∈ Σ we have ui(σi, σ−i) ≥ ui(σi, s−i) for some s−i ∈ S−i. Thus minσ−i ∈Σ−i ui(σi, σ−i) = mins−i ∈S−i ui(σi, s−i). Hence, argmax σi ∈Σi min σ−i ∈Σ−i ui(σi, σ−i) = argmax σi ∈Σi min s−i ∈S−i ui(σi, s−i) Question: Assume a strategy profile where both players play their maxmin strategies? Does it have to be a Nash equilibrium? 117 Zero-Sum Games: von Neumann’s Theorem Assume that G is zero sum, i.e., u1 = −u2. Then σ∗ 2 ∈ Σ2 is maxmin of player 2 iff σ∗ 2 ∈ argmin σ2∈Σ2 max σ1∈Σ1 u1(σ1, σ2) (= argmin σ2∈Σ2 max s1∈S1 u1(s1, σ2)) (Intuitively, maxmin of player 2 minimizes the payoff of player 1 when player 1 plays his best responses. Such strategy of player 2 is often called minmax.) Theorem 48 (von Neumann) Assume a two-player zero-sum game. Then max σ1∈Σ1 min σ2∈Σ2 u1(σ1, σ2) = min σ2∈Σ2 max σ1∈Σ1 u1(σ1, σ2) Morever, σ∗ = (σ∗ 1 , σ∗ 2 ) ∈ Σ is a Nash equilibrium iff both σ∗ 1 and σ∗ 2 are maxmin. So to compute a Nash equilibrium it suffices to compute (arbitrary) maxmin strategies for both players. 118 Proof of Theorem 48 (Homework) Homework: Prove von Neumann’s Theorem in 4 easy steps: 1. Prove this inequality: max σ1∈Σ1 min σ2∈Σ2 u1(σ1, σ2) ≤ min σ2∈Σ2 max σ1∈Σ1 u1(σ1, σ2) 2. Prove that (σ∗ 1 , σ∗ 2 ) is a Nash equilibrium iff min σ2∈Σ2 u1(σ∗ 1, σ2) ≥ u1(σ∗ 1, σ∗ 2) ≥ max σ1∈Σ1 u1(σ1, σ∗ 2) Hint: One of the inequalities is trivial and the other one almost. 3. Use 1. and 2. together with Theorem 39 to prove max σ1∈Σ1 min σ2∈Σ2 u1(σ1, σ2) ≥ min σ2∈Σ2 max σ1∈Σ1 u1(σ1, σ2) 4. Use the above to prove the rest of the theorem. Hint: Use the characterization of NE from 2., do not forget that you already have maxσ1∈Σ1 minσ2∈Σ2 u1(σ1, σ2) = minσ2∈Σ2 maxσ1∈Σ1 u1(σ1, σ2) You may already have proved one of the implications when proving 3. 119 Zero-Sum Two-Player Games – Computing NE Assume S1 = {1, . . . , m1} and S2 = {1, . . . , m2}. We want to compute σ∗ 1 ∈ argmax σ1∈Σ1 min ∈S2 u1(σ1, ) Consider a linear program with variables σ1(1), . . . , σ1(m1), v: maximize: v subject to: m1 k=1 σ1(k) · u1(k, ) ≥ v = 1, . . . , m2 m1 k=1 σ1(k) = 1 σ1(k) ≥ 0 k = 1, . . . , m1 Lemma 49 σ∗ 1 ∈ argmaxσ1∈Σ1 min ∈S2 u1(σ1, ) iff assigning σ1(k) := σ∗ 1 (k) and v := min ∈S2 u1(σ∗ 1 , ) gives an optimal solution. 120 Zero-Sum Two-Player Games – Computing NE Summary: We have reduced computation of NE to computation of maxmin strategies for both players. Maxmin strategies can be computed using linear programming in polynomial time. That is, Nash equilibria in zero-sum two-player games can be computed in polynomial time. 121 IESDS vs Rationalizability Revisited We get Theorem 37 as a simple corollary of Theorem 48. Let s∗ 1 be a strategy of player 1. Consider a zero-sum game G = ({1, 2}, (S1 , S2 ), (u1 , u2 )) where S1 = S1 {s∗ 1 } and S2 = S2, u1 (s1, s2) = u1(s1, s2) − u1(s∗ 1 , s2) and u2 (s1, s2) = u1(s∗ 1 , s2) − u1(s1, s2). Now s∗ 1 is never best resp. in G iff for every σ2 ∈ Σ2 exists σ1 ∈ Σ1 : u1(σ1, σ2) − u1(s∗ 1 , σ2) > 0 iff for every σ2 ∈ Σ2 exists s1 ∈ S1 : u1(s1, σ2) − u1(s∗ 1 , σ2) > 0 iff minσ2∈Σ2 maxs1∈S1 u1 (s1, σ2) > 0 iff minσ2∈Σ2 maxσ1∈Σ1 u1 (σ1, σ2) > 0 iff maxσ1∈Σ1 minσ2∈Σ2 u1 (σ1, σ2) > 0 iff there is σ1 ∈ Σ1 such that for all σ2 ∈ Σ2 we have 0 < u1 (σ1, σ2) = u1(σ1, σ2) − u1(s∗ 1 , σ2) iff s∗ 1 is strictly dominated (by σ1) in G. 122 Lemke-Howson Algorithm – Notation Fix a strategic-form two-player game G = ({1, 2}, (S1, S2) , (u1, u2)). Assume that S1 = {1, . . . , m} S2 = {m + 1, . . . , m + n} (I.e., player 1 has m pure strategies 1, . . . , m and player 2 has n pure strategies m + 1, . . . , m + n. In particular, each pure strategy determines the player who can play it.) Assume that u1, u2 are positive, i.e., u1(k, ) > 0 and u2(k, ) > 0 for all (k, ) ∈ S1 × S2. This assumption is w.l.o.g. since any positive constant can be added to payoffs without altering the set of (mixed) Nash equilibria. Mixed strategies of player 1 : σ1 = (σ(1), . . . , σ(m)) ∈ [0, 1]m Mixed strategies of player 2 : σ2 = (σ(m + 1), . . . , σ(m + n)) ∈ [0, 1]n I.e. we omit the lower index of σ whenever it is determined by the argument. A strategy profile σ = (σ1, σ2) can be seen as a vector σ = (σ1, σ2) = (σ(1), . . . , σ(m + n)) ∈ [0, 1]m+n . 123 Running Example 3 4 1 3, 1 2, 2 2 2, 3 3, 1 Player 1 (row) plays σ1 = (σ(1), σ(2)) ∈ [0, 1]2 Player 2 (column) plays σ2 = (σ(3), σ(4)) ∈ [0, 1]2 A typical mixed strategy profile is (σ(1), σ(2), σ(3), σ(4)) For example: σ1 = (0.2, 0.8) and σ2 = (0.4, 0.6) give the profile (0.2, 0.8, 0.4, 0.6). 124 Characterizing Nash Equilibria Recall that by Lemma 42 the following holds: (σ1, σ2) = (σ(1), . . . , σ(m + n)) ∈ Σ is a Nash equilibrium iff For all = m + 1, . . . , m + n we have that u2(σ1, ) ≤ u2(σ1, σ2) and either σ( ) = 0, or u2(σ1, ) = u2(σ1, σ2) For all k = 1, . . . , m we have that u1(k, σ2) ≤ u1(σ1, σ2) and either σ(k) = 0, or u1(k, σ2) = u1(σ1, σ2) This is equivalent to the following: (σ1, σ2) = (σ(1), . . . , σ(m + n)) ∈ Σ is a Nash equilibrium iff For all = m + 1, . . . , m + n we have that either σ( ) = 0, or is a best response to σ1. For all k = 1, . . . , m we have that either σ(k) = 0, or k is a best response to σ2. 125 Characterizing Nash Equilibria Given a mixed strategy σ1 = (σ(1), . . . , σ(m)) of player 1 we define L(σ1) ⊆ {1, 2, . . . , m + n} to consist of all k ∈ {1, . . . , m} satisfying σ(k) = 0 all ∈ {m + 1, . . . , m + n} that are best responses to σ1 Given a mixed strategy σ2 = (σ(m + 1), . . . , σ(m + n)) of player 2 we define L(σ2) ⊆ {1, 2, . . . , m + n} to consist of all k ∈ {1, . . . , m} that are best responses to σ2 all ∈ {m + 1, . . . , m + n} satisfying σ( ) = 0 Proposition 3 σ = (σ1, σ2) is a Nash equilibrium iff L(σ1) ∪ L(σ2) = {1, . . . , m + n}. We also label the vector 0m := (0, . . . , 0) ∈ Rm with {1, . . . , m} and 0n := (0, . . . , 0) ∈ Rn with {m + 1, . . . , m + n}. We consider (0m , 0n ) as a special mixed strategy profile. How many labels could possibly be assigned to one strategy? 126 Running Example 3 4 1 3, 1 2, 2 2 2, 3 3, 1 A strategy σ1 = (2/3, 1/3) of player 1 is labeled by 3, 4 since both pure strategies 3, 4 of player 2 are best responses to σ1 (they result in the same payoff to player 2) A strategy σ2 = (1/2, 1/2) of player 2 is labeled by 1, 2 since both pure strategies 1, 2 of player 1 are best responses to σ2 (they result in the same payoff to player 1) A strategy σ1 = (0, 1) of player 1 is labeled by 1, 3 since the strategy 1 is played with zero probability in σ1 and 3 is the best response to σ1 A strategy σ1 = (1/10, 9/10) of player 1 is labeled by 3 since no pure strategy of player 1 is played with zero probability (and hence neither 1, nor 2 labels σ1) and 3 is the best response to σ1. 127 Non-degenerate Games Definition: G is non-degenerate if for every σ1 ∈ Σ1 we have that |supp(σ1)| is at least the number of pure best responses to σ1, and for every σ2 ∈ Σ2 we have that |supp(σ2)| is at least the number of pure best responses to σ2. "Most" games are non-degenerate, or can be made non-degenerate by a slight perturbation of payoffs We assume that the game G is non-degenerate. Non-degeneracy implies that L(σ1) ≤ m for every σ1 ∈ Σ1 and L(σ2) ≤ n for every σ2 ∈ Σ2. We say that a strategy σ1 of player 1 (or σ2 of player 2) is fully labeled if |L(σ1)| = m (or |L(σ2)| = n, respectively). Lemma 50 Non-degeneracy of G implies the following: If σi, σi ∈ Σi are fully labeled, then L(σi) L(σi ). There are at most (m+n m ) fully labeled strategies of player 1, (m+n n ) of player 2. For every fully labeled σi ∈ Σi and a label k ∈ L(σi) there is exactly one fully labeled σi ∈ Σi such that L(σi) ∩ L(σi ) = L(σi) {k}. 128 Examples An example of a degenerate game: 3 4 1 1, 1 1, 1 2 3, 3 4, 4 Note that there are two pure best responses to the strategy 1. Are there fully labeled strategies in the following game? 3 4 1 3, 1 2, 2 2 2, 3 3, 1 Yes, the strategy (2/3, 1/3) of player 1 is labeled by 3, 4 and the strategy (1/2, 1/2) of player 2 is labeled by 1, 2. Exercise: Find all fully labeled strategies in the above example. 129 Lemke-Howson (Idea) Define a graph H1 = (V1, E1) where V1 = {σ1 ∈ Σ1 | |L(σ1)| = m} ∪ {0m } and {σ1, σ1 } ∈ E1 iff L(σ1) ∩ L(σ1 ) = L(σ1) {k} for some label k. Note that σ1 is determined by σ1 and k, we say that σ1 is obtained from σ1 by dropping k. Define a graph H2 = (V2, E2) where V2 = {σ2 ∈ Σ2 | |L(σ2)| = n} ∪ {0n } and {σ2, σ2 } ∈ E2 iff L(σ2) ∩ L(σ2 ) = L(σ2) { } for some label . Note that σ2 is determined by σ2 and , we say that σ2 is obtained from σ2 by dropping . Given σi, σi ∈ Vi and k, ∈ {1, . . . , m + n}, we write σi k, ←→ σi if L(σi) ∩ L(σi ) = L(σi) {k} and L(σi) ∩ L(σi ) = L(σi ) { } 130 Running Example 3 4 1 3, 1 2, 2 2 2, 3 3, 1 H1: (0, 0) [1, 2] (1, 0) [2, 4] (0, 1) [1, 3] (2/3, 1/3) [3, 4] H2: (0, 0) [3, 4] (1, 0) [1, 4] (0, 1) [2, 3] (1/2, 1/2) [1, 2] (Here, the red labels of nodes are not parts of the graphs.) For example, (0, 0) 2,3 ←→ (0, 1) and (0, 1) 1,4 ←→ (2/3, 1/3) in H1. 131 Lemke-Howson (Idea) The algorithm basically searches through H1 × H2 = (V1 × V2, E) where (σ1, σ2), (σ1 , σ2 ) ∈ E iff either σ1, σ1 ∈ E1, or σ2, σ2 ∈ E2. Given i ∈ N, we write (σ1, σ2) k, −→ i (σ1, σ2) and say that k was dropped from L(σi) and added to L(σi) if σi k, ←→ σi and σ−i = σ−i . Observe that by Lemma 50, whenever a label k is dropped from L(σi), the resulting vertex of H1 × H2 is uniquely determined. Also, |V| = |V1||V2| ≤ (m+n m )(m+n n ). 132 Running Example 3 4 1 3, 1 2, 2 2 2, 3 3, 1 The graph H1 × H2 has 16 nodes. Let us follow a path in H1 × H2 starting in ((0, 0), (0, 0)): ((0, 0), (0, 0)) 2,3 −→ 1 ((0, 1), (0, 0)) 3,1 −→ 2 ((0, 1), (1, 0)) 1,4 −→ 1 ((2/3, 1/3), (1, 0)) 4,2 −→ 2 ((2/3, 1/3), (1/2, 1/2)) This is one of the paths followed by Lemke-Howson: First, choose which label to drop from L(σ1) (here we drop 2 from L(0, 0)), which adds exactly one new label (here 3) Then always drop the duplicit label, i.e. the one labeling both nodes, until no duplicit label is present (then we have a Nash equilibrium) 133 Lemke-Howson (Idea) Lemke-Howson algorithm works as follows: Start in (σ1, σ2) = (0m , 0n ). Pick a label k ∈ {1, . . . , m} and drop it from L(σ1). This adds a label, which then is the only element of L(σ1) ∩ L(σ2). loop If L(σ1) ∩ L(σ2) = ∅, then stop and return (σ1, σ2). Let { } = L(σ1) ∩ L(σ2), drop from L(σ2). This adds exactly one label to L(σ2). If L(σ1) ∩ L(σ2) = ∅, then stop and return (σ1, σ2). Let {k} = L(σ1) ∩ L(σ2), drop k from L(σ1). This adds exactly one label to L(σ1). Lemma 51 The algorithm proceeds through every vertex of H1 × H2 at most once. Indeed, if (σ1, σ2) is visited twice (with distinct predecessors), then either σ1, or σ2 would have (at least) two neighbors reachable by dropping the label k ∈ L(σ1) ∩ L(σ2), a contradiction with non-degeneracy. Hence the algorithm stops after at most (m+n m )(m+n n ) iterations. 134 Lemke-Howson Algorithm – Detailed Treatment The previous description of the LH algorithm does not specify how to compute the graphs H1 and H2 and how to implement the dropping of labels. In particular, it is not clear how to identify fully labeled strategies and "transitions" between them. The complete algorithm relies on a reformulation which allows us to unify fully labeled strategies (i.e. vertices of H1 and H2) with vertices of certain convex polytopes. The edges of H1 and H2 will correspond to edges of the polytopes. This also gives a fully algebraic procedure for dropping labels. 135 Convex Polytopes A convex combination of points o1, . . . , oi ∈ Rk is a point λ1o1 + · · · + λioi where λi ≥ 0 for each i and i j=1 λj = 1. A convex polytope determined by a set of points o1, . . . , oi is a set of all convex combinations of o1, . . . , oi. A hyperplane h is a supporting hyperplane of a polytope P if it has a non-empty intersection with P and one of the closed half-spaces determined by h contains P. A face of a polytope P is an intersection of P with one of its supporting hyperplanes. A vertex is a 0-dimensional face, an edge is a 1-dim. face. Two vertices are neighbors if they lie on the same edge (they are endpoints of the edge). A polyhedron is an intersection of finitely many closed half-spaces It is a set of solutions of a system of finitely many linear inequalities Fact: Each bounded polyhedron is a polytope, each polytope is a bounded polyhedron. 136 Characterizing Nash Equilibria Let us return back to Lemma 42: (σ1, σ2) = (σ(1), . . . , σ(m + n)) is a Nash equilibrium iff For all = m + 1, . . . , m + n : u2(σ1, ) ≤ u2(σ1, σ2) and either σ( ) = 0, or u2(σ1, ) = u2(σ1, σ2) For all k = 1, . . . , m : u1(k, σ2) ≤ u1(σ1, σ2) and either σ(k) = 0, or u1(k, σ2) = u1(σ1, σ2) Now using the fact that u2(σ1, ) = m k=1 σ(k)u2(k, ) and u1(k, σ2) = m+n =m+1 σ( )u1(k, ) we obtain ... 137 Reformulation (σ1, σ2) = (σ(1), . . . , σ(m + n)) is a Nash equilibrium iff For all = m + 1, . . . , m + n, m k=1 σ(k) · u2(k, ) ≤ u2(σ1, σ2) (3) and either σ( ) = 0, or the ineq. (3) holds with equality. For all k = 1, . . . , m, m+n =m+1 σ( ) · u1(k, ) ≤ u1(σ1, σ2) (4) and either σ(k) = 0, or the ineq. (4) holds with equality. Dividing (3) by u2(σ1, σ2) and (4) by u1(σ1, σ2) we get ... 138 Reformulation (σ1, σ2) = (σ(1), . . . , σ(m + n)) is a Nash equilibrium iff For all = m + 1, . . . , m + n, m k=1 σ(k) u2(σ1, σ2) u2(k, ) ≤ 1 (5) and either σ( ) = 0, or the ineq. (7) holds with equality. For all k = 1, . . . , m, m+n =m+1 σ( ) u1(σ1, σ2) u1(k, ) ≤ 1 (6) and either σ(k) = 0, or the ineq. (8) holds with equality. Considering each σ(k)/u2(σ1, σ2) as an unknown value x(k), and each σ( )/u1(σ1, σ2) as an unknown value y( ), we obtain ... 139 Reformulation ... constraints in variables x(1), . . . , x(m) and y(m + 1), . . . , y(m + n) : For all = m + 1, . . . , m + n, m k=1 x(k) · u2(k, ) ≤ 1 (7) and either y( ) = 0, or the ineq. (7) holds with equality. For all k = 1, . . . , m, m+n =m+1 y( ) · u1(k, ) ≤ 1 (8) and either x(k) = 0, or the ineq. (8) holds with equality. For all non-negative vectors x ≥ 0m and y ≥ 0n that satisfy the above contraints we have that (¯x, ¯y) is a Nash equilibrium. Here the strategy ¯x is defined by ¯x(k) := x(k)/ m i=1 x(i), the strategy ¯y is defined by ¯y( ) := y( )/ m+n j=m+1 y(j) Given a Nash equilibrium (σ1, σ2) = (σ(1), . . . , σ(m + n)), assigning x(k) := σ(k)/u1(σ1, σ2) for k ∈ S1, and y( ) := σ( )/u1(σ1, σ2) for ∈ S2 satisfies the above constraints. 140 Reformulation Let us extend the notion of expected payoff a bit. Given = m + 1, . . . , m + n and x = (x(1), . . . , x(m)) ∈ [0, ∞)m we define u2(x, ) = m k=1 x(k) · u2(k, ) Given k = 1, . . . , m and y = (y(m + 1), . . . , y(m + n)) ∈ [0, ∞)n we define u1(k, y) = m+n =m+1 y( ) · u1(k, ) So the previous system of constraints can be rewritten succinctly: For all = m + 1, . . . , m + n we have that u2(x, ) ≤ 1 and either y( ) = 0, or u2(x, ) = 1. For all k = 1, . . . , m we have that u1(k, y) ≤ 1, and either x(k) = 0, or u1(k, y) = 1 141 Geometric Formulation Define P := x ∈ Rm | (∀k ∈ S1 : x(k) ≥ 0) ∧ (∀ ∈ S2 : u2(x, ) ≤ 1) Q := y ∈ Rn | (∀k ∈ S1 : u1(k, y) ≤ 1) ∧ (∀ ∈ S2 : y( ) ≥ 0) P and Q are convex polytopes. As payoffs are positive and linear in their arguments, P and Q are bounded polyhedra, which means that they are convex hulls of "corners", i.e., they are polytopes. We label points of P and Q as follows: L(x) = {k ∈ S1 | x(k) = 0} ∪ { ∈ S2 | u2(x, ) = 1} L(y) = {k ∈ S1 | u1(k, y) = 1} ∪ { ∈ S2 | y( ) = 0} Proposition 4 For each point (x, y) ∈ P × Q {(0, 0)} such that L(x) ∪ L(y) = {1, . . . , m + n} we have that the corresponding strategy profile (¯x, ¯y) is a Nash equilibrium. Each Nash equilibrium is obtained this way. 142 Geometric Formulation Without proof: Non-degeneracy of G implies that For all x ∈ P we have L(x) ≤ m. x is a vertex of P iff |L(x)| = m (That is, vertices of P are exactly points incident on exactly m faces) For two distinct vertices x, x we have L(x) L(x ). Every vertex of P is incident on exactly m edges; in particular, for each k ∈ L(x) there is a unique (neighboring) vertex x such that L(x) ∩ L(x ) = L(x) {k}. Similar claims are true for Q (just substitute m with n and P with Q). Define a graph H1 = (V1, E1) where V1 is the set of all vertices x of P and {x, x } ∈ E1 iff L(x) ∩ L(x ) = L(x) k. Define a graph H2 = (V2, E2) where V2 is the set of all vertices y of Q and {y, y } ∈ E2 iff L(y) ∩ L(y ) = L(y) k. The notions of dropping and adding labels from and to, resp., remain the same as before. 143 Lemke-Howson (Algorithm) Lemke-Howson algorithm works as follows: Start in (x, y) := (0m , 0n ) ∈ P × Q. Pick a label k ∈ {1, . . . , m} and drop it from L(x). This adds a label, which then is the only element of L(x) ∩ L(y). loop If L(x) ∩ L(y) = ∅, then stop and return (x, y). Let { } = L(x) ∩ L(y), drop from L(y). This adds exactly one label to L(σ2). If L(x) ∩ L(y) = ∅, then stop and return (x, y). Let {k} = L(x) ∩ L(y), drop k from L(x). This adds exactly one label to L(x). Lemma 52 The algorithm proceeds through every vertex of H1 × H2 at most once. Hence the algorithm stops after at most (m+n m )(m+n n ) iterations. 144 The Algebraic Procedure How to effectively move between vertices of H1 × H2 ? That is how to compute the result of dropping a label? We employ so called tableau method with an appropriate pivoting. 145 Slack Variables Formulation Recall our succinct characterization of Nash equilibria: For all = m + 1, . . . , m + n we have that u2(x, ) ≤ 1 and either y( ) = 0, or u2(x, ) = 1. For all k = 1, . . . , m we have that u1(k, y) ≤ 1, and either x(k) = 0, or u1(k, y) = 1 We turn this into a system o equations in variables x(1), . . . , x(m), y(m + 1), . . . , y(m + n) and slack variables r(1), . . . , r(m), z(m + 1), . . . , z(m + n) : u2(x, ) + z( ) = 1 ∈ S2 u1(k, y) + r(k) = 1 k ∈ S1 x(k) ≥ 0 y( ) ≥ 0 k ∈ S1, ∈ S2 r(k) ≥ 0 z( ) ≥ 0 k ∈ S1, ∈ S2 x(k) · r(k) = 0 y( ) · z( ) = 0 k ∈ S1, ∈ S2 Solving this is called linear complementary problem (LCP). 146 Tableaux The LM algorithm represents the current vertex of H1 × H2 using a tableau defined as follows. Define two sets of variables: M := {x(1), . . . , x(m), z(m + 1), . . . , z(m + n)} N := {r(1), . . . , r(m), y(m + 1), . . . , y(m + n)} A basis is a pair of sets of variables M ⊆ M and N ⊆ N where |M| = n and |N| = m. Intuition: Labels correspond to variables that are not in the basis A tableau T for a given basis (M, N): P : v = cv − v ∈M M av · v v ∈ M Q : w = cw − w ∈N N aw · w w ∈ N Here each cv , cw ≥ 0 and av , aw ∈ R. Note that the first part of the tableau corresponds to the polytope P, the second one to the polytope Q. 147 Tableaux implementation of Lemke-Howson A basic solution of a tableau T is obtained by assigning zero to non-basic variables and computing the rest. During a computation of the LM algorithm, the basic solutions will correspond to vertices of the two polytopes P and Q. Initial tableau: M = {z(m + 1), . . . , z(m + n)} and N = {r(1), . . . , r(m)} P : z( ) = 1 − m k=1 x(k) · u2(k, ) ∈ S2 Q : r(k) = 1 − m+n =m+1 y( ) · u1(k, ) k ∈ S1 Note that assigning 0 to all non-basic variables we obtain x(k) = 0 for k = 1, . . . , m and y( ) = 0 for = m + 1, . . . , m + n. So this particular tableau corresponds to (0m , 0n ). Note that non-basic variables correspond precisely to labels of (0m , 0n ). 148 Lemke-Howson – Pivoting Given a tableau T during a computation: P : v = cv − v ∈M M av · v v ∈ M Q : w = cw − w ∈N N aw · w w ∈ N Dropping a label corresponding to a variable ¯v ∈ M M (i.e. dropping a label in P) is done by adding ¯v to the basis as follows: Find an equation v = cv − v ∈M M av · v , with minimum cv /a¯v . Here cv 0, and we assume that if a¯v = 0, then cv /a¯v = ∞ M := (M {v}) ∪ {¯v} Reorganize the equation so that ¯v is on the left-hand side: ¯v = cv a¯v − v ∈M M,v v av a¯v · v − v a¯v Substitute the new expression for v to all other equations. Dropping labels in Q works similarly. 149 Lemke-Howson – Tableaux The previous slide gives a procedure for computing one step of the LH algorithm. The computation ends when: For each complementary pair (x(k), r(k)) one of the variables is in the basis and the other one is not For each complementary pair (y( ), z( )) one of the variables is in the basis and the other one is not 150 Lemke-Howson – Example Initial tableau (M = {z(3), z(4)}, N = {r(1), r(2)}): z(3) = 1 − x(1) · 1 − x(2) · 3 (9) z(4) = 1 − x(1) · 2 − x(2) · 1 (10) r(1) = 1 − y(3) · 3 − y(4) · 2 (11) r(2) = 1 − y(3) · 2 − y(4) · 3 (12) Drop the label 2 from P: The minimum ratio 1/3 is in (9). x(2) = 1/3 − (1/3) · x(1) − (1/3) · z(3) (13) z(4) = 2/3 − (5/3) · x(1) − (1/3) · z(3) (14) r(1) = 1 − y(3) · 3 − y(4) · 2 (15) r(2) = 1 − y(3) · 2 − y(4) · 3 (16) Here M = {x(2), z(4)}, N = {r(1), r(2)}. Drop the label 3 from Q: The minimum ratio 1/3 is in (15). 151 Lemke-Howson – Example (Cont.) x(2) = 1/3 − (1/3) · x(1) − (1/3) · z(3) (17) z(4) = 2/3 − (5/3) · x(1) − (1/3) · z(3) (18) y(3) = 1/3 − (2/3) · y(4) − (1/3) · r(1) (19) r(2) = 1/3 − (5/3) · y(4) − (1/3) · r(1) (20) Here M = {x(2), z(4)}, N = {y(3), r(2)}. Drop the label 1: The minimum ratio (2/3)/(5/3) = 2/5 is in (18). x(2) = 1/5 − (4/15) · z(3) − (1/5) · z(4) (21) x(1) = 2/5 − (1/5) · z(3) − (3/5) · z(4) (22) y(3) = 1/3 − (2/3) · y(4) − (1/3) · r(1) (23) r(2) = 1/3 − (5/3) · y(4) − (1/3) · r(1) (24) Here M = {x(2), x(1)}, N = {y(3), r(2)}. Drop the label 4: The minimum ratio 1/5 is in (24). 152 Lemke-Howson – Example (Cont.) x(2) = 1/5 − (4/15) · z(3) − (1/5) · z(4) (25) x(1) = 2/5 − (1/5) · z(3) − (3/5) · z(4) (26) y(3) = 1/5 − (1/5) · r(1) − (6/15) · r(2) (27) y(4) = 1/5 − (1/5) · r(1) − (3/5) · r(2) (28) Here M = {x(2), x(1)}, N = {y(3), y(4)} and thus x(1) ∈ M but r(1) N x(2) ∈ M but r(2) N y(3) ∈ N but z(3) M y(4) ∈ N but z(4) M So the algorithm stops. Assign z(3) = z(4) = r(1) = r(2) = 0 and obtain the following Nash equilibrium: x(1) = 2/5, x(2) = 1/5, y(3) = 1/5, y(4) = 1/5 153 Strategic-Form Games – Conclusion We have considered static games of complete information, i.e., "one-shot" games where the players know exactly what game they are playing. We modeled such games using strategic-form games. We have considered both pure strategy setting and mixed strategy setting. In both cases, we considered four solution concepts: Strictly dominant strategies Iterative elimination of strictly dominated strategies Rationalizability (i.e., iterative elimination of strategies that are never best responses) Nash equilibria 154 Strategic-Form Games – Conclusion In pure strategy setting: 1. Strictly dominant strategy equilibrium survives IESDS, rationalizability and is the unique Nash equilibrium (if it exists) 2. In finite games, rationalizable equilibria survive IESDS, IESDS preserves the set of Nash equilibria 3. In finite games, rationalizability preserves Nash equilibria In mixed setting: 1. In finite two player games, IESDS and rationalizability coincide. 2. Strictly dominant strategy equilibrium survives IESDS (rationalizability) and is the unique Nash equilibrium (if it exists) 3. In finite games, IESDS (rationalizability) preserves Nash equilibria The proofs for 2. and 3. in the mixed setting are similar to corresponding proofs in the pure setting. 155 Algorithms Strictly dominant strategy equilibria coincide in pure and mixed settings, and can be computed in polynomial time. IESDS and rationalizability can be implemented in polynomial time in the pure setting as well as in the mixed setting In the mixed setting, linear programming is needed to implement one step of IESDS (rationalizability). Nash equilibria can be computed for two-player games in polynomial time for zero-sum games (using von Neumann’s theorem and linear programming) in exponential time using support enumeration in PPAD using Lemke-Howson 156 Complexity of Nash Eq. – FNP (Roughly) Let R be a binary relation on words (over some alphabet) that is polynomial-time computable and polynomially balanced. I.e., membership to R is decidable in polynomial time, and (x, y) ∈ R implies |y| ≤ |x|k where k is independent of x, y. A search problem associated with R is this: Given an input x, return a y such that (x, y) ∈ R if such y exists, and return "NO" otherwise. Note that the problem of computing NE can be seen as a search problem R where (x, y) ∈ R means that x is a strategic-form game and y is a Nash equilibrium of polynomial size. (We already know from support enumeration that there is a NE of polynomial size.) The class of all search problems is called FNP. A class FP ⊆ FNP contains all search problem that can be solved in polynomial time. A search problem determined by R is polynomially reducible to a search problem R iff there exist polynomially computable functions f, g such that if (x, y) ∈ R for some y, then (f(x), y ) ∈ R for some y if (f(x), y) ∈ R , then (x, g(y)) ∈ R if (f(x), y) R for all y, then (x, y) R for all y 157 Complexity of Nash Eq. – PPAD (Roughly) The class PPAD is defined by specifying one of its complete problems (w.r.t. the polynomial time reduction) known as End-Of-The-Line: Input: Two Boolean circuits (with basis ∧, ∨, ¬) S and P, each with m input bits and m output bits, such that P(0m ) = 0m S(0m ). Problem: Find an input x ∈ {0, 1}m such that P(S(x)) x or S(P(x)) x 0m . Intuition: End-Of-The-Line creates a directed graph HS,P with vertex set {0, 1}m and an edge from x to y whenever both y = S(x) ("successor") and x = P(y) ("predecessor"). All vertices of HS,P have indegree and outdegree at most one. There is at least one source (i.e., x satisfying P(x) = x, namely 0m ), so there is at least one sink (i.e., x satisfying S(x) = x). The goal is to find either a source or a sink different from 0m . Theorem 53 The problem of computing Nash equilibria is complete for PPAD. That is, Nash belongs to PPAD and End-Of-The-Line is polynomially reducible to Nash. 158 Loose Ends – Modes of Dominance Let σi, σi ∈ Σi. Then σi is strictly dominated by σi if ui(σi, σ−i) > ui(σi , σ−i) for all σ−i ∈ Σ−i. Let σi, σi ∈ Σi. Then σi is weakly dominated by σi if ui(σi, σ−i) ≥ ui(σi , σ−i) for all σ−i ∈ Σ−i and there is σ−i ∈ Σ−i such that ui(σi, σ−i ) > ui(σi , σ−i ). Let σi, σi ∈ Σi. Then σi is very weakly dominated by σi if ui(σi, σ−i) ≥ ui(σi , σ−i) for all σ−i ∈ Σ−i. A strategy is (strictly, weakly, very weakly) dominant in mixed strategies if it (strictly, weakly, very weakly) dominates any other mixed strategy. Claim 4 Any mixed strategy profile σ ∈ Σ such that each σi is very weakly dominant in mixed strategies is a mixed Nash equilibrium. The same claim can be proved in pure strategy setting. 159 Dynamic Games of Complete Information Extensive-Form Games Definition Sub-Game Perfect Equilibria 160 Dynamic Games of Perfect Information (Motivation) Static games (modeled using strategic-form games) cannot capture games that unfold over time. In particular, as all players move simultaneously, there is no way how to model situations in which order of moves is important. Imagine e.g. chess where players take turns, in every round a player knows all turns of the opponent before making his own turn. There are many examples of dynamic games: markets that change over time, political negotiations, models of computer systems, etc. We model dynamic games using extensive-form games, a tree like model that allows to express sequential nature of games. We start with perfect information games, where each player always knows results of all previous moves. Then generalize to imperfect information, where players may have only partial knowledge of these results (e.g. most card games). 161 Perfect-Info. Extensive-Form Games (Example) 1 h0 2 h1 (3, 1) K (1, 3) U L 2 h2 (2, 1) K (0, 0) U R Here h0, h1, h2 are non-terminal nodes, leaves are terminal nodes. Each non-terminal node is owned by a player who chooses an action. E.g. h1 is owned by player 2 who chooses either K or U Every action results in a transition to a new node. Choosing L in h0 results in a move to h1 When a play reaches a terminal node, players collect payoffs. E.g. the left most terminal node gives 3 to player 1 and 1 to player 2. 162 Perfect-Information Extensive-Form Games A perfect-information extensive-form game is a tuple G = (N, A, H, Z, χ, ρ, π, h0, u) where N = {1, . . . , n} is a set of n players, A is a (single) set of actions, H is a set of non-terminal (choice) nodes, Z is a set of terminal nodes (assume Z ∩ H = ∅), denote H = H ∪ Z, χ : H → 2A {∅} is the action function, which assigns to each choice node a non-empty set of enabled actions, ρ : H → N is the player function, which assigns to each non-terminal node a player i ∈ N who chooses an action there, we define Hi := {h ∈ H | ρ(h) = i}, π : H × A → H is the successor function, which maps a non-terminal node and an action to a new node, such that h0 is the only node that is not in the image of π (the root) for all h1, h2 ∈ H and for all a1 ∈ χ(h1) and all a2 ∈ χ(h2), if π(h1, a1) = π(h2, a2), then h1 = h2 and a1 = a2, u = (u1, . . . , un), where each ui : Z → R is a payoff function for player i in the terminal nodes of Z. 163 Some Notation A path from h ∈ H to h ∈ H is a sequence h1a2h2a3h3 · · · hk−1ak hk where h1 = h, hk = h and π(hj−1, aj) = hj for every 1 < j ≤ k. Note that, in particular, h is a path from h to h. Assumption: For every h ∈ H there is a unique path from h0 to h and there is no infinite path (i.e., a sequence h1a2h2a3h3 · · · such that π(hj−1, aj) = hj for every j > 1). Note that the assumption is satisfied when H is finite. Indeed, uniqueness follows immediately from the definition of π. Now let X be the set of all h from which there is a path to h. If h0 ∈ X we are done. Otherwise, let h be a node of X with the longest path to h. As h h0, there is h and a ∈ χ(h ) such that h = π(h , a). But then there is a path from h to h that is longer than the path from h , a contradiction. The above claim implies that every perfect-information extensive-form game can be seen as a game on a rooted tree (H, E, h0) where H ∪ Z is a set of nodes, E ⊆ H × H is a set of edges defined by (h, h ) ∈ E iff h ∈ H and there is a ∈ χ(h) such that π(h, a) = h , h0 is the root. 164 Some More Notation h is a child of h, and h is a parent of h if there is a ∈ χ(h) such that h = π(h, a). h ∈ H is reachable from h ∈ H if there is a path from h to h . If h is reachable from h we say that h is a descendant of h and h is an ancestor of h (note that, by definition, h is both a descendant and an ancestor of itself). 165 Example: Trust Game 1 h0 (5, 5) z1 D h1 2 (0, 20) z2 K (7.5, 12.5) z3 S T Two players, both start with 5$ Player 1 either distrusts (D) player 2 and keeps the money (payoffs (5, 5)), or trusts (T) player 2 and passes 5$ to player 2 If player 1 chooses to trust player 2, the money is tripled by the experimenter and sent to player 2. Player 2 may either keep (K) the additional 15$ (resulting in (0, 20)), or share (S) it with player 1 (resulting in (7.5, 12.5)) 166 Example: Trust Game (Cont.) 1 h0 (5, 5) z1 D h1 2 (0, 20) z2 K (7.5, 12.5) z3 S T N = {1, 2}, A = {D, T, K, S} H = {h0, h1}, Z = {z1, z2, z3} χ(h0) = {D, T}, χ(h1) = {K, S} ρ(h0) = 1, ρ(h1) = 2 π(h0, D) = z1, π(h0, T) = h1, π(h1, K) = z2, π(h1, S) = z3 u1(z1) = 5, u1(z2) = 0, u1(z3) = 7.5, u2(z1) = 5, u2(z2) = 20, u2(z3) = 12.5 167 Stackelberg Competition Very similar to Cournot duopoly ... Two identical firms, players 1 and 2, produce some good. Denote by q1 and q2 quantities produced by firms 1 and 2, resp. The total quantity of products in the market is q1 + q2. The price of each item is κ − q1 − q2 where κ > 0 is fixed. Firms have a common per item production cost c. Except that ... As opposed to Cournot duopoly, the firm 1 moves first, and chooses the quantity q1 ∈ [0, ∞). Afterwards, the firm 2 chooses q2 ∈ [0, ∞) (knowing q1) and then the firms get their payoffs. 168 Stackelberg Competition – Extensive-Form Model An extensive-form game model: N = {1, 2} A = [0, ∞) H = {h0, h q1 1 | q1 ∈ [0, ∞)} Z = {zq1,q2 | q1, q2 ∈ [0, ∞) χ(h0) = [0, ∞), χ(h q1 1 ) = [0, ∞) ρ(h0) = 1, ρ(h q1 1 ) = 2 π(h0, q1) = h q1 1 , π(h q1 1 , q2) = zq1,q2 The payoffs are u1(zq1,q2 ) = q1(κ − q1 − q2) − q1c u2(zq1,q2 ) = q2(κ − q1 − q2) − q2c 169 Example: Chess (a bit simplified) There are infinitely many representations of chess, this one is different from the one presented at the lecture. N = {1, 2} Denoting Boards the set of all (appropriately encoded) board positions, we define H = B × {1, 2} where B = {w ∈ Boards+ | no board repeats ≥ 3 times in w} (Here Boards+ is the set of all non-empty sequences of boards) Z consists of all nodes (wb, i) (here b ∈ Boards) where either b is checkmate for player i, or i does not have a move in b, or every move of i in b leads to a board with two occurrences in w χ(wb, i) is the set of all legal moves of player i in b ρ(wb, i) = i π is defined by π((wb, i), a) = (wbb , 2 − i + 1) where b is obtained from b according to the move a h0 = (b0, 1) where b0 is the initial board uj(wb, i) ∈ {1, 0, −1}, here 1 means "win", 0 means "draw", and −1 means "loss" for player j 170 Pure Strategies Let G = (N, A, H, Z, χ, ρ, π, h0, u) be a perfect-information extensive-form game. Definition 54 A pure strategy of player i in G is a function si : Hi → A such that for every h ∈ Hi we have that si(h) ∈ χ(h). We denote by Si the set of all pure strategies of player i in G. Denote by S = S1 × · · · × Sn the set of all pure strategy profiles. Note that each pure strategy profile s ∈ S determines a unique path ws = h0a1h1 · · · hk−1ak hk from h0 to a terminal node hk by aj = sρ(hj−1)(hj−1) ∀0 < j ≤ k Denote by O(s) the terminal node reached by ws. Abusing notation a bit, we denote by ui(s) the value ui(O(s)) of the payoff for player i when the terminal node O(s) is reached using strategies of s. 171 Example: Trust Game 1 h0 (5, 5) z1 D h1 2 (0, 20) z2 K (7.5, 12.5) z3 S T A pure strategy profile (s1, s2) where s1(h0) = T and s2(h1) = K is usually written as TK (BFS & left to right traversal) determines the path h0T h1K z2 The resulting payoffs: u1(s1, s2) = 0 and u2(s1, s2) = 20. 172 Extensive-Form vs Strategic-Form The extensive-form game G determines the corresponding strategic-form game ¯G = (N, (Si)i∈N , (ui)i∈N) Here note that the set of players N and the sets of pure strategies Si are the same in G and in the corresponding game. The payoff functions ui in ¯G are understood as functions on the pure strategy profiles of S = S1 × · · · × Sn. With this definition, we may apply all solution concepts and algorithms developed for strategic-form games to the extensive form games. We often consider the extensive-form to be only a different way of representing the corresponding strategic-form game and do not distuinguish between them. There are some issues, namely whether all notions from strategic-form area make sense in the extensive-form. Also, naive application of algorithms may result in unnecessarily high complexity. For now, let us consider pure strategies only! 173 Example: Trust Game 1 h0 (5, 5) z1 D h1 2 (0, 20) z2 K (7.5, 12.5) z3 S T Is any strategy strictly (weakly, very weakly) dominant? Is any strategy never best response? Is there a Nash equilibrium in pure strategies ? 174 Example 1 h0 2 h1 (3, 1) K (1, 3) U L 2 h2 (2, 1) K (0, 0) U R Find all pure strategies of both players. Is any strategy (strictly, weakly, very weakly) dominant? Is any strategy (strictly, weakly, very weakly) dominated? Is any strategy never best response? Are there Nash equilibria in pure strategies ? 175 Example 1 h0 2 h1 (3, 1) K (1, 3) U L 2 h2 (2, 1) K (0, 0) U R KK KU UK UU L 3, 1 3, 1 1, 3 1, 3 R 2, 1 0, 0 2, 1 0, 0 Find all pure strategies of both players. Is any strategy (strictly, weakly, very weakly) dominant? Is any strategy (strictly, weakly, very weakly) dominated? Is any strategy never best response? Are there Nash equilibria in pure strategies ? 176 Criticism of Nash Equilibria 1 h0 2 h1 (3, 1) K (1, 3) U L 2 h2 (2, 1) K (0, 0) U R KK KU UK UU L 3, 1 3, 1 1, 3 1, 3 R 2, 1 0, 0 2, 1 0, 0 Two Nash equilibria in pure strategies: (L, UU ) and (R, UK ) Examine (L, UU ): Player 2 threats to play U in h2, as a result, player 1 plays L, player 2 reacts to L by playing the best response, i.e., U. However, the threat is not credible, once a play reaches h2, a rational player 2 chooses K . 177 Criticism of Nash Equilibria 1 h0 2 h1 (3, 1) K (1, 3) U L 2 h2 (2, 1) K (0, 0) U R KK KU UK UU L 3, 1 3, 1 1, 3 1, 3 R 2, 1 0, 0 2, 1 0, 0 Two Nash equilibria in pure strategies: (L, UU ) and (R, UK ) Examine (R, UK ): This equilibrium is sensible in the following sense: Player 2 plays the best response in both h1 and h2 Player 1 plays the "best response" in h0 assuming that player 2 will play his best responses in the future. This equilibrium is called subgame perfect. 178 Subgame Perfect Equilibria Given h ∈ H, we denote by Hh the set of all nodes reachable from h. Definition 55 (Subgame) A subgame Gh of G rooted in h ∈ H is the restriction of G to nodes reachable from h in the game tree. More precisely, Gh = (N, A, Hh , Zh , χh , ρh , πh , h, uh ) where Hh = H ∩ Hh , Zh = Z ∩ Hh , χh and ρh are restrictions of χ and ρ to Hh , resp., (Given a function f : A → B and C ⊆ A, a restriction of f to C is a function g : C → B such that g(x) = f(x) for all x ∈ C.) πh is defined for h ∈ Hh and a ∈ χh (h ) by πh (h , a) = π(h , a) each uh i is a restriction of ui to Zh Definition 56 A subgame perfect equilibrium (SPE) in pure strategies is a pure strategy profile s ∈ S such that for any subgame Gh of G, the restriction of s to Hh is a Nash equilibrium in pure strategies in Gh . A restriction of s = (s1, . . . , sn) ∈ S to Hh is a strategy profile sh = (sh 1 , . . . , sh n ) where sh i (h ) = si(h ) for all i ∈ N and all h ∈ Hi ∩ Hh . 179 Stackelberg Competition – SPE N = {1, 2}, A = [0, ∞) H = {h0, hq1 1 | q1 ∈ [0, ∞)}, Z = {zq1,q2 | q1, q2 ∈ [0, ∞) χ(h0) = [0, ∞), χ(hq1 1 ) = [0, ∞), ρ(h0) = 1, ρ(hq1 1 ) = 2 π(h0, q1) = hq1 1 , π(hq1 1 , q2) = zq1,q2 The payoffs are u1(zq1,q2 ) = q1(κ − c − q1 − q2), u2(zq1,q2 ) = q2(κ − c − q1 − q2) Denote θ = κ − c Player 1 chooses q1, we know that the best response of player 2 is q2 = (θ − q1)/2 where θ = κ − c. Then u1(zq1,q2 ) = q1(θ − q1 − θ/2 − q1/2) = (θ/2)q1 − q2 1 /2 which is maximized by q1 = θ/2, giving q2 = θ/4. Then u1(zq1,q2 ) = θ2 /8 and u2(zq1,q2 ) = θ2 /16. Note that firm 1 has an advantage as a leader. 180 Existence of SPE From this moment on we consider only finite games! Theorem 57 Every finite perfect-information extensive-form game has a SPE in pure strategies. Proof: By induction on the number of nodes. Base case: If |H| = 1, the only node is terminal, and the trivial pure strategy profile is SPE. Induction step: Consider a game with more than one node. Let K = {h1, . . . , hk } be the set of all children of the root h0. By induction, for every h there is a SPE sh in Gh . For every i ∈ N, define a strategy si of player i in G as follows: for i = ρ(h0) we have si(h0) ∈ argmaxh ∈K uh i (sh ) for all i ∈ N and h ∈ H we have si(h) = sh i (h) where h ∈ Hh ∩ Hi We claim that s = (s1, . . . , sn) is a SPE in pure strategies. By definition, s is NE in all subgames except (possibly) the G itself. 181 Existence of SPE (Cont.) Let h = sρ(h0)(h0). Consider a possible deviation of player i. Let ¯s be another pure strategy profile in G obtained from s = (s1, . . . , sn) by changing si. First, assume that i ρ(h0). Then ui(s) = uh i (sh ) ≥ uh i (¯sh ) = ui(¯s) Here the first equality follows from h = sρ(h0)(h0) and that s behaves similarly as sh in Gh , the inequality follows from the fact that sh is a NE in Gh , and the second equality follows from h = sρ(h0)(h0) = ¯sρ(h0)(h0). Second, assume that i = ρ(h0). Let hr = ¯si(h0) = ¯sρ(h0)(h0). Then uh i (sh ) ≥ uhr i (shr ) because h maximizes the payoff of player i = ρ(h0) in the children of h0. But then ui(s) = uh i (sh ) ≥ uhr i (shr ) ≥ uhr i (¯shr ) = ui(¯s) 182 Chess Recall that in the model of chess, the payoffs were from {1, 0, −1} and u1 = −u2 (i.e. it is zero-sum). By Theorem 57, there is a SPE in pure strategies (s∗ 1 , s∗ 2 ). However, then one of the following holds: 1. White has a winning strategy If u1(s∗ 1 , s∗ 2 ) = 1 and thus u2(s∗ 1 , s∗ 2 ) = −1 2. Black has a winning strategy If u1(s∗ 1 , s∗ 2 ) = −1 and thus u2(s∗ 1 , s∗ 2 ) = 1 3. Both players have strategies to force a draw If u1(s∗ 1 , s∗ 2 ) = 0 and thus u2(s∗ 1 , s∗ 2 ) = 0 Question: Which one is the right answer? Answer: Nobody knows yet ... the tree is too big! Even with ∼ 200 depth & ∼ 5 moves per node: 5200 nodes! 183 Backward Induction The proof of Theorem 57 gives an efficient procedure for computing SPE for finite perfect-information extensive-form games. Backward Induction: We inductively "attach" to every node h a SPE sh in Gh , together with a vector of expected payoffs u(h) = (u1(h), . . . , un(h)). Initially: Attach to each terminal node z ∈ Z the empty profile sz = (∅, . . . , ∅) and the payoff vector u(z) = (u1(z), . . . , un(z)). While(there is an unattached node h with all children attached): 1. Let K be the set of all children of h 2. Let hmax ∈ argmax h ∈K uρ(h)(h ) 3. Attach to h a SPE sh where sh ρ(h) (h) = hmax for all i ∈ N and all h ∈ Hi define sh i (h ) = s ¯h i (h ) where h ∈ H ¯h ∩ Hi (in G ¯h , each sh i behaves as s ¯h i i.e. sh ¯h = s ¯h ) 4. Attach to h the expected payoffs ui(h) = ui(hmax) for i ∈ N. 184 Efficient Algorithms for Pure Nash Equilibria In the step 2. of the backward induction, the algorithm may choose an arbitrary hmax ∈ argmaxh ∈K uρ(h)(h ) and always obtain a SPE. In order to compute all SPE, the algorithm may systematically search through all possible choices of hmax throughout the induction. Backward induction is too inefficient (unnecessarily searches through the whole tree). There are better algorithms, such as α−β-prunning. For details, extensions etc. see e.g. PB016 Artificial Intelligence I Multi-player alpha-beta prunning, R. Korf, Artificial Intelligence 48, pages 99-111, 1991 Artificial Intelligence: A Modern Approach (3rd edition), S. Russell and P. Norvig, Prentice Hall, 2009 185 Example Centipede game: A A A A A D D D D D (1, 0) (0, 2) (3, 1) (2, 4) (4, 3) (3, 5)1 2 1 2 1 SPE in pure strategies: (DDD, DD) ... Isn’t it weird? There are serious issues here ... In laboratory setting, people usually play A for several steps. There is a theoretical problem: Imagine, that you are player 2. What would you do when player 1 chooses A in the first step? The SPE analysis says that you should go down, but the same analysis also says that the situation you are in cannot appear :-) 186 Dynamic Games of Complete Information Extensive-Form Games Mixed and Behavioral Strategies 187 Mixed and Behavioral Strategies Definition 58 A mixed strategy σi of player i in G is a mixed strategy of player i in the corresponding strategic-form game. I.e., a mixed strategy σi of player i in G is a probability distribution on Si (recall that Si is the set of all pure strategies, i.e., functions of the form si : Hi → A). As before, we denote by σi the set of all mixed strategies of player i and by Σ the set of all mixed strategy profiles Σ1 × · · · × Σn. Definition 59 A behavioral strategy of player i in G is a function βi : Hi → ∆(A) such that for every h ∈ Hi we have that supp(βi(h)) ⊆ χ(h). Given a profile β = (β1, . . . , βn) of behavioral strategies, we denote by Pβ(z) the probability of reaching z ∈ Z when β is used, i.e., Pβ(z) = k =1 βρ(h −1)(h )(a ) where h0a1h1a2h2 · · · ak hk is the unique path from h0 to hk = z. We define ui(β) := z∈Z Pβ(z) · ui(z). 188 Behavioral Strategies: Example 1 h0 2 h1 z1 B 1 h3 z2 C z3 ¯C ¯B A 2 h2 z4 D z5 ¯D ¯A Pure strategies of player 1: AC, A ¯C, ¯AC, ¯A ¯C An example of a mixed strategy σ1 of player 1: σ1(AC) = 1 3 , σ1(A ¯C) = 1 9 , σ1(¯AC) = 1 6 and σ1(¯A ¯C) = 11 18 189 Behavioral Strategies: Example 1 h0 2 h1 z1 B 1 h3 z2 C z3 ¯C ¯B A 2 h2 z4 D z5 ¯D ¯A An example of behavioral strategies of both players: player 1: β1(h0)(A) = 1 3 and β1(h3)(C) = 1 2 player 2: β2(h1)(B) = 1 4 and β2(h2)(D) = 1 5 P(β1,β2)(z2) = 1 3 1 − 1 4 1 2 = 1 8 190 Behavioral Strategies: Example 1 h0 2 h1 z1 (1, 0) B 1 h3 z2 (2, 3) C z3 (3, 2) ¯C ¯B A 2 h2 z4 (1, 1) D z5 (5, 4) ¯D ¯A β = (β1, β2) player 1: β1(h0)(A) = 1 3 and β1(h3)(C) = 1 2 player 2: β2(h1)(B) = 1 4 and β2(h2)(D) = 1 5 u1(β) = Pβ(z1) · 1 + Pβ(z2) · 2 + Pβ(z3) · 3 + Pβ(z4) · 1 + Pβ(z5) · 5 = 1 3 1 4 1 + 1 3 3 4 1 2 2 + 1 3 3 4 1 2 3 + 2 3 1 5 1 + 2 3 4 5 5 ≈ 3.508 191 Mixed/Behavioral Profiles Each pure strategy can be considered as a behavioral strategy. Definition 60 A mixed/behavioral strategy profile is a tuple α = (α1, . . . , αn) where each αi is either a mixed, or a behavioral strategy. Let α = (α1, . . . , αn) be a mixed/behavioral strategy profile, and let M = {i1, . . . , ik } ⊆ N be the set of all players ij ∈ N such that αij is a mixed strategy. We define ui(α) = si1 ∈Si1 · · · sik ∈Sik   k =1 αi (si )   · ui(α1, . . . , αn) where αj =    sj if j ∈ M, αj otherwise. Intuitively, ui(α) is the expected payoff of player i in the following play: First, each player i ∈ M chooses his pure strategy si randomly with the probability αi (si ), then these fixed pure strategies are played against the behavioral strategies of players from N M (who may still randomize along the play). 192 Equivalence of Mixed and Behavioral Strategies We show how to translate behavioral strategies to equivalent mixed ones (w.r.t. probabilities of reaching terminal nodes) and vice versa. Behavioral to mixed: We say that a mixed strategy σi is induced by a behavioral strategy βi if σi(si) = h∈Hi βi(h)(si(h)) for all si ∈ Si Mixed to behavioral: For this direction some notation is needed. Given h ∈ H, we denote by w[h] the unique path from h0 to h. Given h ∈ Hi, we denote by Sh i the set of all pure strategies si ∈ Si such that for every h ∈ Hi visited by w[h] we have that si(h ) is the action chosen in h on w[h]. Intuitively, Sh i consists of all pure strategies that on the unique path from h0 to h chose the appropriate actions to stay on the path. In other words, h can be reached using si (assuming that the opponents play appropriately) iff si ∈ Sh i . Given h ∈ Hi and a ∈ χ(h), we denote by Sh,a i ⊆ Sh i the set of all pure strategies si ∈ Sh i such that si(h) = a. I.e., strategies of Sh,a i may reach h and then choose a there. 193 Equivalence of Mixed and Behavioral Strategies (Cont.) We say that a behavioral strategy βi is induced by a mixed strategy σi if the following holds: For every h ∈ Hi and a ∈ χ(h) either si ∈Sh i σi(si) = 0 or βi(h)(a) = si ∈Sh,a i σi(si) si ∈Sh i σi(si) Intuitively, βi(h)(a) is the probability of selecting a in h assuming that h can be reached with a positive probability if the other players play appropriately. If the probability of reaching h using σi is zero (no matter of what the opponents are doing), then the βi(h) may be defined arbitrarily since h is reached with zero probability using β as well. 194 Equivalence of Mixed and Behavioral Strategies Theorem 61 Let α be a mixed/behavioral strategy profile and let α be any mixed/behavioral profile obtained from α by substituting some of the strategies in α with strategies they induce. Then ui(α) = ui(α ). In fact, any node of H is reached from h0 with the same probability for all such α . 195 Equivalence of Mixed and Behavioral Strategies 1 h0 2 h1 z1 (1, 0) B 1 h3 z2 (2, 3) C z3 (3, 2) ¯C ¯B A 2 h2 z4 (1, 1) D z5 (5, 4) ¯D ¯A Pure strategies of player 1: AC, A ¯C, ¯AC, ¯A ¯C Pure strategies of player 2: BD, B ¯D, ¯BD, ¯B ¯D Mixed strategies of player 1: σ1 = (pAC , pA ¯C , p¯A,C , p¯A ¯C ) (Here pXY = σ1(s) where s is a pure str. such that s(h0) = X, s(h3) = Y) Mixed strategies of player 2: σ2 = (pBD, pB ¯D, p¯BD, p¯B ¯D) 196 Equivalence of Mixed and Behavioral Strategies 1 h0 2 h1 z1 (1, 0) B 1 h3 z2 (2, 3) C z3 (3, 2) ¯C ¯B A 2 h2 z4 (1, 1) D z5 (5, 4) ¯D ¯A Behavioral strategies of player 1: β1 = (qA , qC ) were qA = β1(h0)(A) and qC = β1(h3)(C); Denote q¯A = 1 − qA and q¯C = 1 − qC Behavioral strategies of player 2: β2 = (qB , qD) and we use q¯B = 1 − qB and q¯D = 1 − qD 197 Equivalence of Mixed and Behavioral Strategies Behavioral to mixed: Given β1 = (qA , qC ) and β2 = (qB , qD) define σ1 = (pAC , pA ¯C , p¯A,C , p¯A ¯C ) := (qA qC , qA q¯C , q¯A qC , q¯A q¯C ) σ2 = (pBD, pB ¯D, p¯BD, p¯B ¯D) := (qB qD, qB q¯D, q¯B qD, q¯B q¯D) What is the probability of reaching z2 ? Using (β1, β2) : qA q¯B qC (i.e. multiply the probabilities assigned by β1, β2 along the path from h0 to z2) Using (σ1, σ2) : (qA qC )(q¯B qD + q¯B q¯D) = qA q¯B qC (i.e., player 1 needs to choose the pure strategy AC, player 2 needs to choose any pure strategy which selects ¯B) Using (σ1, β2) : (qA qC )q¯B = qA q¯B qC (i.e., first player 1 chooses a pure strategy, this needs to be AC, and then player 2 plays against this particular strategy by choosing ¯B) Using (β1, σ2) : (q¯B qD + q¯B q¯D)qA qC = qA q¯B qC (i.e., first player 2 chooses a pure strategy, needs to be one playing ¯B in h1, and then player 1 plays against this strategy by choosing A and C) 198 Equivalence of Mixed and Behavioral Strategies Mixed to behavioral: Given σ1 = (pAC, pA ¯C, p¯A,C, p¯A ¯C) and σ2 = (pBD, pB ¯D, p¯BD, p¯B ¯D) we have β1 = (qA , qC) where qA = pAC +pA ¯C qC =    pAC pAC +pA ¯C if pAC + pA ¯C > 0 x otherwise Here x is an arbitrary number between 0 and 1. β2 = (qB, qD) where qB = pBD + pB ¯D qD = pBD + p¯BD 199 Equivalence of Mixed and Behavioral Strategies First, consider qA = pAC + pA ¯C > 0. What is the probability of reaching z2 ? Using (σ1, σ2) : pAC · (p¯BD + p¯B ¯D) i.e., player 1 chooses AC and player 2 chooses a pure str. playing ¯B Using (β1, β2) : qA · q¯B · qC = (pAC + pA ¯C ) · q¯B · pAC pAC + pA ¯C = q¯B · pAC = pAC · (1 − qB ) = pAC · (1 − (pBD + pB ¯D)) = pAC · (p¯BD + p¯B ¯D) Using (β1, σ2) : (p¯BD + p¯B ¯D) · qA · qC = qA · q¯B · qC = pAC · (p¯BD + p¯B ¯D) i.e., first player 2 chooses a pure strategy playing ¯B in h1 and then player 1 plays the behavioral strategy β1 against it 200 Equivalence of Mixed and Behavioral Strategies Using (σ1, β2) : pAC · q¯B = pAC · (p¯BD + p¯B ¯D) i.e., first player 1 chooses the pure strategy AC and then player 2 plays the behavioral str. β2 against it Observe that all possible combinations of mixed and behavioral strategies give the same probability of reaching z2; this holds for all terminal nodes and hence all combinations give the same payoff. Now, assume qA = pAC + pA ¯C = 0 (which implies pAC = 0). What is the probability of reaching z2 ? Using (σ1, σ2) : pAC · (p¯BD + p¯B ¯D) = 0 Using (β1, β2) : qA · q¯B · qC = 0 Using (β1, σ2) : (p¯BD + p¯B ¯D) · qA · qC = 0 Using (σ1, β2) : pAC · q¯B = 0 201 Behavioral (Mixed) Strategy SPE Let us denote by Bi the set of all behavioral strategies of player i, and by B the set of all behavioral strategy profiles B1 × . . . × Bn. Definition 62 β = (β1, . . . , βn) ∈ B is a behavioral Nash equilibrium if ui(βi, β−i) ≥ ui(βi , β−i) for all i ∈ N and βi ∈ Bi Observe that due to Theorem 61 behavioral NE coincide with mixed NE. Definition 63 A subgame perfect equilibrium (SPE) in behavioral strategies is a behavioral strategy profile β ∈ B such that for any subgame Gh of G, the restriction of β to Hh is a behavioral Nash equilibrium. Here β = (β1, . . . , βn) and the restriction of β to Gh is a behavioral strategy profile βh = (βh 1 , . . . , βh n) where each βh i is a restriction of βi to Hh ∩ Hi. Theorem 64 There exists a pure strategy profile which is a SPE in behavioral strategies. The proof is similar to the proof of Theorem 57. 202 Comments on Algorithms Note that some SPE in behavioral strategies can be computed using the backward induction. Indeed, the algorithm computes a pure strategy profile where each player always maximizes his value; such a pure strategy profile is SPE in both pure and behavioral strategies. Even though there always exists a pure SPE, there may exist (a continuum of) SPE composed of "non-pure" behavioral strategies. However, the necessary and sufficient condition for existence of such SPE is that at some point of the backward induction one of the players (say i) has two or more alternatives with the same equilibrium payoff. The same payoff is only for the player i, the other players may have different payoffs depending on the choice of the player i. Then any convex combination of such alternatives can be made by the player i, still leading to SPE (of course, for each combination the resulting SPE may be different). For two players the backward induction can be extended to compute (a finite representation of) all SPE in behavioral strategies in polynomial time. 203 Dynamic Games of Complete Information Extensive-Form Games Imperfect-Information Games 204 Extensive-form of Matching Pennies Is it possible to model Matching pennies using extensive-form games? H T H 1, −1 −1, 1 T −1, 1 1, −1 1 h0 2 h1 (1, −1) H (−1, 1) T H 2 h2 (−1, 1) H (1, −1) T T The problem is that player 2 is "perfectly" informed about the choice of player 1. In particular, there are pure Nash equilibria (H, TH) and (T, TH) in the extensive-form game as opposed to the strategic-form. Reversing the order of players does not help. We need to extend the formalism to be able to hide some information about previous moves. 205 Extensive-form of Matching Pennies Matching pennies can be modeled using an imperfect-information extensive-form game: 1 h0 2 h1 (1, −1) H (−1, 1) T H 2 h2 (−1, 1) H (1, −1) T T Here h1 and h2 belong to the same information set of player 2. As a result, player 2 is not able to distinguish between h1 and h2. So even though players do not move simultaneously, the information player 2 has about the current situation is the same as in the simultaneous case. 206 Imperfect Information Games An imperfect-information extensive-form game is a tuple Gimp = (Gperf , I) where Gperf = (N, A, H, Z, χ, ρ, π, h0, u) is a perfect-information extensive-form game (called the underlying game), I = (I1, . . . , In) where for each i ∈ N = {1, . . . , n} Ii = {Ii,1, . . . , Ii,ki } is a collection of information sets for player i that satisfies ki j=1 Ii,j = Hi and Ii,j ∩ Ii,k = ∅ for j k (i.e., Ii is a partition of Hi) for all h, h ∈ Ii,j, we have ρ(h) = ρ(h ) and χ(h) = χ(h ) (i.e., nodes from the same information set are owned by the same player and have the same sets of enabled actions) Given h ∈ H, we denote by I(h) the information set Ii,j containing h. Given an information set Ii,j, we denote by χ(Ii,j) the set of all actions enabled in some (and hence all) nodes of Ii,j. 207 Imperfect Information Games – Strategies Now we define the set of pure, mixed, and behavioral strategies in Gimp as subsets of pure, mixed, and behavioral strategies, resp., in Gperf that respect the information sets. Let Gimp = (Gperf , I) be an imperfect-information extensive-form game where Gperf = (N, A, H, Z, χ, ρ, π, h0, u). Definition 65 A pure strategy of player i in Gimp is a pure strategy si in Gperf such that for all j = 1, . . . , ki and all h, h ∈ Ii,j holds si(h) = si(h ). Note that each si can also be seen as a function si : Ii → A such that for every Ii,j ∈ Ii we have that si(Ii,j) ∈ χ(Ii,j). As before, we denote by Si the set of all pure strategies of player i in Gimp, and by S = S1 × · · · × Sn the set of all pure strategy profiles. As in the perfect-information case we have a corresponding strategic-form game ¯Gimp = (N, (Si)i∈N , (ui)i∈N). 208 Matching Pennies 1 h0 2 h1 (1, −1) H (−1, 1) T H 2 h2 (−1, 1) H (1, −1) T T I1 = {I1,1} where I1,1 = {h0} I1 = {I2,1} where I2,1 = {h1, h2} Example of pure strategies: s1(I1,1) = H which describes the strategy s1(h0) = H s2(I2,1) = T which describes the strategy s2(h1) = s2(h2) = T (it is also sufficient to specify s2(h1) = T since then s2(h2) = T) So we really have strategies H, T for player 1 and H, T for player 2. 209 Weird Example 1 h0 2 h1 (1, 2) K (2, 1) L A 2 h2 (3, 5) K (7, 1) L B 1 h3 (2, 5) A (11, 0) B (−4, 10) C C Note that I1 = {I1,1} where I1,1 = {h0, h3} and that I2 = {I2,1} where I2,1 = {h1, h2} What pure strategies are in this example? 210 SPE with Imperfect Information 1 h0 2 h1 h3 1 z1 (4, 1) C z2 (1, 4) ¯C B 1 h4 z3 (1, 4) C z4 (4, 1) ¯C ¯B A 2 h2 z5 (1, 1) D z6 (4, 5) ¯D ¯A What we designate as subgames to allow the backward induction? Only subtrees rooted in h1, h2, and h0 (together with all subtrees rooted in terminal nodes) Note that subtrees rooted in h3 and h4 cannot be considered as "independent" subgames because their individual solutions cannot be combined to a single best response in the information set {h3, h4}. 211 SPE with Imperfect Information Let Gimp = (Gperf , I) be an imperfect-information extensive-form game where Gperf = (N, A, H, Z, χ, ρ, π, h0, u) is the underlying perfect-information extensive-form game. Let us denote by Hproper the set of all h ∈ H that satisfy the following: For every h reachable from h, we have that either all nodes of I(h ) are reachable from h, or no node of I(h ) is reachable from h. Intuitively, h ∈ Hproper iff every information set Ii,j is either completely contained in the subtree rooted in h, or no node of Ii,j is contained in the subtree. Definition 66 For every h ∈ Hproper we define a subgame Gh imp to be the imperfect information game (Gh perf , Ih ) where Ih is the restriction of I to Hh . Note that as subgames of Gimp we consider only subgames of Gperf that respect the information sets, i.e., are rooted in nodes of Hproper . Definition 67 A strategy profile s ∈ S is a subgame perfect equilibrium (SPE) if sh is a Nash equilibrium in every subgame Gh imp of Gimp (here h ∈ Hproper ). 212 Backward Induction with Imperfect Info The backward induction generalizes to imperfect-information extensive-form games along the following lines: 1. As in the perfect-information case, the goal is to label each node h ∈ Hproper ∪ Z with a SPE sh and a vector of payoffs u(h) = (u1(h), . . . , un(h)) for individual players according to sh . 2. Starting with terminal nodes, the labeling proceeds bottom up. Terminal nodes are labeled similarly as in the perfect-inf. case. 3. Consider h ∈ Hproper , let K be the set of all h ∈ Hproper ∪ Z {h} that are h’s closest descendants out of Hproper ∪ Z. I.e., h ∈ K iff h h is reachable from h and the unique path from h to h visits only nodes of H Hproper (except the first and the last node). For every h ∈ K we have already computed a SPE sh in Gh imp and the vector of corresponding payoffs u(h ). 4. Now consider all nodes of K as terminal nodes where each h ∈ K has payoffs u(h ). This gives a new game in which we compute an equilibrium ¯sh together with the vector u(h). The equilibrium sh is then obtained by "concatenating" ¯sh with all sh , here h ∈ K, in the subgames Gh imp of Gh imp . 213 Mutually Assured Destruction Analysis of Cuban missile crisis of 1962 (as described in Games for Business and Economics by R. Gardner) The crisis started with United States’ discovery of Soviet nuclear missiles in Cuba. The USSR then backed down, agreeing to remove the missiles from Cuba, which suggests that US had a credible threat "if you don’t back off we both pay dearly". Question: Could this indeed be a credible threat? 214 Mutually Assured Destruction (Cont.) Model as an extensive-form game: First, player 1 (US) chooses to either ignore the incident (I), resulting in maintenance of status quo (payoffs (0, 0)), or escalate the situation (E). Following escalation by player 1, player 2 can back down (B), causing it to lose face (payoffs (10, −10)), or it can choose to proceed to a nuclear confrontation (N). Upon this choice, the players play a simultaneous-move game in which they can either retreat (R), or choose doomsday (D). If both retreat, the payoffs are (−5, −5), a small loss due to a mobilization process. If either of them chooses doomsday, then the world destructs and payoffs are (−100, −100). Find SPE in pure strategies. 215 Mutually Assured Destruction (Cont.) 1 h0 2 h1 h2 1 h3 2 (−5, −5) z1 R (−100, −100) z2 D R h4 2 (−100, −100) z3 R (−100, −100) z4 D D N (10, −10) z5 B E (0, 0) z6 I Solve G h2 imp (a strategic-form game). Then G h1 imp by solving a game rooted in h1 with terminal nodes h2, z5 (payoffs in h2 correspond to an equilibrium in G h2 imp ). Finally solve Gimp by solving a game rooted in h0 with terminal nodes h1, z6 (payoffs in h1 have been computed in the previous step). 216 Dynamic Games of Complete Information Extensive-Form Games Imperfect-Information Mixed and Behavioral Strategies Games with Chance Nodes 217 Mixed and Behavioral Strategies Definition 68 A mixed strategy σi of player i in Gimp is a mixed strategy of player i in the corresponding strategic-form game ¯Gimp = (N, (Si)i∈N , ui). Do not forget that now si ∈ Si iff si is a pure strategy that assigns the same action to all nodes of every information set. Hence each si ∈ Si can be seen as a function si : Ii → A. As before, we denote by Σi the set of all mixed strategies of player i and by Σ the set of all mixed strategy profiles Σ1 × · · · × Σn. Definition 69 A behavioral strategy of player i in Gimp is a behavioral strategy βi in Gperf such that for all j = 1, . . . , ki and all h, h ∈ Ii,j : βi(h) = βi(h ). Each βi can be seen as a function βi : Ii → ∆(A) such that for all Ii,j ∈ Ii we have supp(βi(Ii,j)) ⊆ χ(Ii,j). Are they equivalent as in the perfect-information case? 218 Example: Absent Minded Driver 1 0 L 1 5 L 1 R R Only one player: A driver who has to take a turn at a particular junction. There are two identical junctions, the first one leads to a wrong neighborhood where the driver gets completely lost (payoff 0), the second one leads home (payoff 5). If the driver misses both, there is a longer way home (payoff 1). The problem is that after missing the first turn, the driver forgets that he missed the turn. Behavioral strategy: β1(I1,1)(L) = 1 2 has the expected payoff 3 2 . No mixed strategy gives a larger payoff than 1 since no pure strategy ever reaches the terminal node with payoff 5. 219 Kuhn’s Theorem Definition 70 Player i has perfect recall in Gimp if the following holds: Every information set of player i intersects every path from the root h0 to a terminal node at most once. Every two paths from the root that end in the same information set of player i pass through the same information sets of player i, and in the same order, and in every such information set the two paths choose the same action. In other words, along all paths ending in the same information set, player i sees the same sequence of information sets and makes the same decisions in his nodes (i.e. at the end knows exactly the sequence of visited information sets and all his own choices along the way). 220 Kuhn’s Theorem The notion of induced strategies can be straightforwardly generalized to imperfect information games: Behavioral to mixed: We say that a mixed strategy σi is induced by a behavioral strategy βi if σi(si) = Ii,j ∈Ii βi(Ii,j)(si(Ii,j)) for all si ∈ Si As before, for the opposite direction some notation is needed. Recall that given h ∈ H, we denote by w[h] the unique path from h0 to h. Given h ∈ Hi, we denote by Sh i the set of all pure strategies si ∈ Si such that for every h ∈ Hi visited by w[h] we have that si(h ) is the action chosen in h on w[h]. Given h ∈ Hi and a ∈ χ(h), we denote by Sh,a i the set of all pure strategies si ∈ Sh i such that si(h) = a. 221 Kuhn’s Theorem Mixed to behavioral: We say that a behavioral strategy βi is induced by a mixed strategy σi if the following holds: Let Ii,j be an information set of player i and let h ∈ Ii,j be (an arbitrary) node of Ii,j. We have that either si ∈Sh i σi(si) = 0 or for each action a ∈ χ(h) (= χ(Ii,j)) : βi(Ii,j)(a) = si ∈Sh,a i σi(si) si ∈Sh i σi(si) (Here the perfect recall implies that the definition of βi(Ii,j) does not depend on the choice of h.) Theorem 71 (Kuhn, 1953) Let α be a mixed/behavioral strategy profile and let α be any mixed/behavioral profile obtained from α by substituting some of the strategies in α with strategies they induce. Then ui(α) = ui(α ). The concepts of Nash equilibria and SPE in behavioral strategies are the same as in the perfect information case. 222 Complexity of Zero-Sum Games Recall that a behavioral strategy βi of player i is maxmin if βi ∈ argmax βi ∈Bi min β−i ∈B−i ui(βi , β−i) Similarly for pure and mixed strategies. Theorem 72 (Koller and Megiddo, 1990) Consider finite two-player zero-sum imperfect information games. For such games with perfect recall, the problem of computing a maxmin behavioral strategy is in PTIME. For games with possibly imperfect recall, the problem of computing a (pure, behavioral, or mixed) strategy that guarantees a given payoff is NP-hard. How to compute Nash equilibria in polynomial time? Existence of a poly. time algorithm for computing behavioral NE does not immediately follow from Thm 72 and von Neumann’s Thm 48. Indeed, Thm 48 has been proved only for mixed strategies. However, using Kuhn’s thm, von Neumann’s thm can be easily extended to behavioral strategies. 223 Complexity of Zero-Sum Games Proposition 5 Let (β1, β2) be a behavioral strategy profile. Then (β1, β2) is a NE iff both β1 and β2 are maxmin. Proof. Let (β1, β2) be a profile of behavioral strategies. Apply Kuhn’s theorem and obtain induced mixed strategies (σ1, σ2). Since we used only the Kuhn’s theorem to obtain (σ1, σ2) from (β1, β2), for both i ∈ {1, 2} holds: ui(β1, β2) = ui(σ1, σ2) and for every behavioral strategy β−i and an induced mixed strategy σ−i , we have that ui(βi, β−i ) = ui(σi, σ−i ), for every mixed strategy σ−i and an induced behavioral strategy β−i , we have that ui(σi, σ−i ) = ui(βi, β−i ). Now (β1, β2) is a Nash equilibrium iff (σ1, σ2) is a Nash equilibrium iff σ1 and σ2 are maxmin iff β1 and β2 are maxmin. Corollary 73 The complexity of computing Nash equilibria in behavioral strategies in two-player zero-sum imperfect information games with perfect recall is in PTIME. 224 Complexity of Non-Zero-Sum Games Computing NE (or SPE) in non-zero-sum imperfect-information extensive-form games is at least as hard as for strategic-form games. Backward induction helps in decomposing the game into "subgames" rooted in nodes of Hsingle but large games may still remain to be solved using other methods. Naively, any solution concept developed for strategic-form games can be applied to imperfect-information extensive-form games (with perfect recall) via the corresponding strategic-form game ¯Gimp. However, such solution is not efficient (the corresponding game is exponentially large and often degenerate). More efficient methods exist for two-player games of perfect recall, e.g., using sequence form representation of the game, where nodes of Gimp are represented by sequences of actions leading to the nodes, which leads to a linear complementarity problem of polynomial size, which in turn can be solved using a modified Lemke-Howson. For a detailed treatment of complexity see "The complexity of computing a (perfect) equilibrium for an n-player extensive form game of perfect recall" by Kousha Etessami. 225 Imperfect-information and Chance Nodes 0 h0 1 h1 (−2, 2) C (−1, 1) F 1 2 1 h2 (2, −2) C (−1, 1) F 1 2 A very simple card game: Player 1 chooses randomly a card from a large deck of cards, containing only an equal number of Kings and Aces. Then Player 1 may either call (C) or fold (F), no look at the card. If he folds, then pays $1 to player 2, otherwise call + King means that player 1 pays $2 to player 2 call + Ace means that player 2 pays $2 to player 1 226 Imperfect-information and Chance Nodes An imperfect-information extensive-form game with chance nodes is a tuple Gimp = (Gperf , I, β0) where The set of players N is equal to {0, 1, . . . , n} (i.e., there is a new player 0 called chance, or nature), We assume that for every h ∈ H0 the set of enabled actions χ(h) is the set of all children nodes of h, Each information set of player 0 is a singleton (i.e., the nature has a complete information), β0 is a fixed behavioral strategy for player 0. Player 0 always plays according to β0. Note that due to the above assumption, β0(h) is a distribution on all children of h As player 0 plays the same strategy always, we exclude this strategy from strategy profiles (i.e. pure strategy profiles remain to be elements of S1 × · · · × Sn) A game with chance nodes is a perfect information game if all information sets of I are singletons. 227 Example 0 h0 1 h1 (−2, 2) C (−1, 1) F 1 2 1 h2 (2, −2) C (−1, 1) F 1 2 Here β0(h0)(h1) = 1 2 and β0(h0)(h2) = 1 2 Player 1 has just one information set I1,1 = {h1, h2}. Consider a mixed strategy σ1 of player 1 defined by σ1(I1,1)(C) = 1 4 and σ1(I1,1)(F) = 3 4 . Then u1(σ1) = 1 2 1 4 (−2) + 1 2 3 4 (−1) + 1 2 1 4 2 + 1 2 3 4 (−1) = −3 4 228 Results All results for games without chance nodes presented so far remain valid for games with chance nodes. In particular, Theorem 57 and Theorem 64 remain valid for games of perfect information with chance nodes. Concretely: Theorem 74 Consider games of perfect information with chance nodes. There exists a pure strategy profile which is a SPE with respect to pure strategies. There exists a pure strategy profile which is a SPE with respect to behavioral strategies. Backward induction can be straightforwardly modified to deal with chance nodes (see next slide). 229 Backward induction with perfect info. & chance Backward Induction: We inductively "attach" to every node h a SPE sh in Gh , and expected payoffs u(h) = (u1(h), . . . , un(h)). Initially: Attach to each terminal node z ∈ Z the empty profile sz = (∅, . . . , ∅) and the payoff vector u(z) = (u1(z), . . . , un(z)). While(there is an unattached node h with all children attached): 1. Let K be the set of all children of h 2. If χ(h) 0 then let hmax ∈ argmaxh ∈K uρ(h)(h ) and attach to h a SPE sh where sh ρ(h) (h) = hmax and for i ∈ N {0} and all h ∈ Hi define sh i (h ) = s ¯h i (h ) where h ∈ H ¯h ∩ Hi (i.e. in subgames rooted in ¯h ∈ K, sh behaves as s ¯h .) attach to h expected payoffs ui(h) = ui(hmax) for i ∈ N {0} 3. If χ(h) = 0, then attach to h a SPE sh where for all i ∈ N {0} and all h ∈ Hi define sh i (h ) = s ¯h i (h ) where h ∈ H ¯h ∩ Hi (i.e. in subgames rooted in ¯h ∈ K, sh behaves as s ¯h .) attach to h the expected payoffs ui(h) = ¯h∈K β0(h)(¯h) ui(¯h) (i.e., the weighted average payoff in all children nodes) 230 Backward Induction for Imperfect Info & Chance The high-level description of backward induction for imperfect-information games given earlier remains valid also for imperfect-information games with chance nodes. We only have to notice that in the games newly created in step 4., player 0 participates with the strategy β0. 231 Dynamic Games of Complete Information Repeated Games Finitely Repeated Games 232 Example C S C −5, −5 0, −20 S −20, 0 −1, −1 Imagine that the criminals are being arrested repeatedly. Can they somewhat reflect upon their experience in order to play "better"? In what follows we consider strategic-form games played repeatedly for finitely many rounds, the final payoff of each player will be the average of payoffs from all rounds infinitely many rounds, here we consider a discounted sum of payoffs and the long-run average payoff We analyze Nash equilibria and sub-game perfect equilibria. We stick to pure strategies only! 233 Finitely Repeated Games Let G = ({1, 2}, (S1, S2) , (u1, u2)) be a finite strategic-form game of two players. A T-stage game GT-rep based on G proceeds in T stages so that in a stage t ≥ 1, players choose a strategy profile st = (st 1 , st 2 ). After T stages, both players collect the average payoff T t=1 ui(st ) / T. A history of length 0 ≤ t ≤ T is a sequence h = s1 · · · st ∈ St of t strategy profiles. Denote by H(t) the set of all histories of length t. A pure strategy for player i in a T-stage game GT-rep is a function τi : T−1 t=0 H(t) → Si which for every possible history chooses a next step for player i. Every strategy profile τ = (τ1, τ2) in GT-rep induces a sequence of pure strategy profiles wτ = s1 · · · sT in G so that st i = τi(s1 · · · st−1 ). Given a pure strategy profile τ in GT-rep such that wτ = s1 · · · sT , define the payoffs ui(τ) = T t=1 ui(st ) / T. 234 Example C S C −5, −5 0, −20 S −20, 0 −1, −1 Consider a 3-stage game. Examples of histories: , (C, S), (C, S)(S, S), (C, S)(S, S)(C, C) Here the last one is terminal, obtained using τ1, τ2 s.t.: τ1( ) = C, τ1((C, S)) = S, τ1((C, S)(S, S)) = C τ2( ) = S, τ2((C, S)) = S, τ2((C, S)(S, S)) = C Thus w(τ1,τ2) = (C, S)(S, S)(C, C) u1(τ1, τ2) = (0 + (−1) + (−5))/3 = −2 u2(τ1, τ2) = (−20 + (−1) + (−5))/3 = −26/3 235 Finitely Repeated Games in Extensive-Form Every T-stage game GT-rep can be defined as an imperfect information extensive-form game. Define an imperfect-information extensive-form game Grep imp = (Grep perf , I) such that Grep perf = ({1, 2}, A, H, Z, χ, ρ, π, h0, u) where A = S1 ∪ S2 H = (S1 × S2)≤T ∪ (S1 × S2) uδ 1(τh 1, τh 2) (29) Since δ < 1 and ui are bounded on S, we may safely choose ¯τ1 so that ¯τ1(h ) = τ1(h ) for all sufficiently long histories h . Indeed, since ui is bounded on pure strategies of G, the sum ∞ t= δt · ui(st+1 ) goes to 0 as goes to ∞; hence the strict inequality (29) remains valid even if ¯τ1 is arbitrarily modified in a very distant future. 250 One-Shot Deviation Principle Let h be a history of maximum length such that h is a prefix of h and ¯τ1(h ) τ1(h ). (Note that then ¯τ1(h h ) = τ1(h h ) for all h ε.) Let ¯τ11 be a strategy of player 1 obtained from ¯τ1 by changing ¯τ1(h ) to τ1(h ). Now note that the one-shot deviation property implies, that uδ 1(¯τh 11, τh 2 ) = uδ 1(τh 1 , τh 2 ) ≥ uδ 1(¯τh 1 , τh 2 ) and thus uδ 1 (¯τh 11 , τh 2 ) ≥ uδ 1 (¯τh 1 , τh 2 ) > uδ 1 (τh 1 , τh 2 ). Note that ¯τh 11 has a strictly smaller number of deviations from τh 1 than ¯τh 1 . Repeating the same argument with ¯τ11 in place of ¯τ1 we obtain ¯τ12 such that uδ 1 (¯τh 12 , τh 2 ) ≥ uδ 1 (¯τh 11 , τh 2 ) > uδ 1 (τh 1 , τh 2 ). Here ¯τh 12 has even less deviations from τh 1 than ¯τh 11 . Then repeating with ¯τ12 in place of ¯τ1 we obtain ¯τ13 such that uδ 1 (¯τh 13 , τh 2 ) ≥ uδ 1 (¯τh 12 , τh 2 ) > uδ 1 (τh 1 , τh 2 ), etc., still decreasing the number of deviations from τh 1 . Eventually, as ¯τh 1 has only finitely many deviations from τh 1 , we get ¯τh 1k = τh 1 for some k and thus uδ 1 (τh 1 , τh 2 ) = uδ 1 (¯τh 1k , τh 2 ) > uδ 1 (τh 1 , τh 2 ), a contradiction. 251 Example Consider the infinitely repeated game based on Prisoner’s dilemma: C S C −5, −5 0, −20 S −20, 0 −1, −1 The grim trigger profile (τ1, τ2) where τi(s1 · · · sT ) =    S T = 0 S s = (S, S) for all 1 ≤ ≤ T C otherwise is a SPE. 252 A Simple Version of Folk Theorem Let G = ({1, 2}, (S1, S2) , (u1, u2)) be a two-player strategic-form game where u1, u2 are bounded on S = S1 × S2 (but S may be infinite) and let s∗ be a Nash equilibrium in G. Let s be a strategy profile in G satisfying ui(s) > ui(s∗ ) for all i ∈ N. Consider the following grim trigger for s using s∗ strategy profile τ = (τ1, τ2) in Girep where τi(s1 · · · sT ) =    si T = 0 si s = s for all 1 ≤ ≤ T s∗ i otherwise Then for δ ≥ max i∈{1,2} maxsi ∈Si ui(si , s−i) − ui(s) maxsi ∈Si ui(si , s−i) − ui(s∗) we have that (τ1, τ2) is a SPE in Gδ irep and uδ i (τ) = ui(s). Proof: Consider a possible one-shot deviation ¯τ1 of player 1, i.e., there is exactly one h such that ¯τ1(h) τ1(h). We distinguish two cases depending on h. 253 Proof of Simple Folk Theorem (Cont.) Case 1: h s · · · s. Then there is a deviation from s in h and thus according to (τh 1 , τh 2 ) both players play s∗ forever : uδ 1(τh 1, τh 2) = (1 − δ) ∞ k=0 δk u1(s∗ ) = u1(s∗ )(1 − δ) ∞ k=0 δk = u1(s∗ ) Now (¯τh 1 , τh 2 ) gives a sequence w(¯τh 1 ,τh 2 ) = (s1 , s∗ 2 )s∗ s∗ · · · where s1 is a strategy of player 1 to which he deviates after h. Here player 2 plays s∗ 2 all the time after h because one of the players has already deviated in h. We obtain u1(¯τh 1, τh 2) = (1 − δ)  u1(s1, s∗ 2) + ∞ k=1 δk u1(s∗ )   ≤ (1 − δ)  u1(s∗ 1, s∗ 2) + ∞ k=1 δk u1(s∗ )   = u1(s∗ ) So this deviation cannot be beneficial no matter what δ is. 254 Proof of Simple Folk Theorem (Cont.) Case 2: h = s · · · s. Clearly, u1(τh 1 , τh 2 ) = u1(s). Now (¯τh 1 , τh 2 ) gives a sequence w(¯τh 1 ,τh 2 ) = (s1 , s2)s∗ s∗ · · · where s1 is a strategy of player 1 to which he deviates after h. As opposed to the previous case, here player 2 first plays s2 (since the deviation of player 1 to s1 is the first deviation in the history) and then both players react by playing s∗ forever. If u1(s1 , s2) < u1(s), then uδ 1(¯τh 1, τh 2) = (1 − δ)  u1(s1, s2) + ∞ k=1 δk u1(s∗ )   < (1 − δ)  u1(s1, s2) + ∞ k=1 δk u1(s∗ )   < (1 − δ)  u1(s) + ∞ k=1 δk u1(s)   = u1(s) = uδ 1(τh 1, τh 2) and thus this deviation is also not beneficial no matter what δ is. 255 Proof of Simple Folk Theorem (Cont.) Finally, if u1(s1 , s2) ≥ u1(s), then uδ 1(¯τh 1, τh 2) = (1 − δ)  u1(s1, s2) + ∞ k=1 δk u1(s∗ )   = (1 − δ)u1(s1, s2) + (1 − δ)u1(s∗ ) · δ ∞ k=0 δk = u1(s1, s2) − δ · u1(s1, s2) + δ · u1(s∗ ) Thus uδ 1 (¯τh 1 , τh 2 ) ≤ uδ 1 (τh 1 , τh 2 ) = u1(s) iff u1(s1 , s2) − δ · u1(s1 , s2) + δ · u1(s∗ ) ≤ u1(s) iff u1(s1 , s2) − u1(s) ≤ δ · (u1(s1 , s2) − u1(s∗ )) iff δ ≥ u1(s1 , s2) − u1(s) u1(s1 , s2) − u1(s∗) 256 Proof of Simple Folk Theorem (Cont.) Thus (τ1, τ2) satisfies the one-shot deviation property in Gδ irep w.r.t. player 1 if δ ≥ u1(s1 , s2) − u1(s) u1(s1 , s2) − u1(s∗) for all s1 ∈ S1 satisfying u1(s1, s2) ≥ u1(s) Note that the right-hand-side expression is maximized when u1(s1 , s2) is maximized and thus we get δ ≥ maxs1 ∈S1 u1(s1 , s2) − u1(s) maxs1 ∈S1 u1(s1 , s2) − u1(s∗) Proving the same for player 2 and putting the results together, we obtain that (τ1, τ2) satisfies the one-shot deviation property in Gδ irep if δ ≥ max i∈{1,2} maxsi ∈Si ui(si , s−i) − ui(s) maxsi ∈Si ui(si , s−i) − ui(s∗) (30) Thus by Theorem 79, (τ1, τ2) is a SPE in Gδ irep if δ satisfies ineq. (30). 257 Simple Folk Theorem – Example Consider the infinitely repeated game Girep based on the following game G: m f r M 4, 4 −1, 5 3, 0 F 5, −1 1, 1 0, 0 R 0, 3 0, 0 2, 2 NE in G : (F, f) Consider the grim trigger for (M, m) using (F, f), i.e., the profile (τ1, τ2) in Girep where τ1 : Plays M in a given stage if (M, m) was played in all previous stages, and plays F otherwise. τ2 : Plays m in a given stage if (M, m) was played in all previous stages, and plays f otherwise. This is a SPE in Gδ irep for all δ ≥ 1 4 . Also, ui(τ1, τ2) = 4 for i ∈ {1, 2}. Are there other SPE? Yes, a grim trigger for (R, r) using (F, f). This is a SPE in Gδ irep for δ ≥ 1 2 . 258 Tacit Collusion Consider the Cournot duopoly game model G = (N, (Si)i∈N , (ui)i∈N) N = {1, 2} Si = [0, κ] u1(q1, q2) = q1(κ − q1 − q2) − q1c1 = (κ − c1)q1 − q2 1 − q1q2 u2(q1, q2) = q2(κ − q2 − q1) − q2c2 = (κ − c2)q2 − q2 2 − q2q1 Assume for simplicity that c1 = c2 = c and denote θ = κ − c. If the firms sign a binding contract to produce only θ/4, their profit would be θ2 /8 which is higher than the profit θ2 /9 for playing the NE (θ/3, θ/3). However, such contracts are forbidden in many countries (including US). Is it still possible that the firms will behave selfishly (i.e. only maximizing their profits) and still obtain such payoffs? In other words, is there a SPE in the infinitely repeated game based on G (with a discount factor δ) which gives the payoffs θ2 /8 ? 259 Tacit Collusion Consider the Cournot duopoly game model G = (N, (Si)i∈N , (ui)i∈N) N = {1, 2} Si = [0, ∞) u1(q1, q2) = q1(κ − q1 − q2) − q1c1 = (κ − c1)q1 − q2 1 − q1q2 u2(q1, q2) = q2(κ − q2 − q1) − q2c2 = (κ − c2)q2 − q2 2 − q2q1 Assume for simplicity that c1 = c2 = c and denote θ = κ − c. Consider the grim trigger profile for (θ/4, θ/4) using (θ/3, θ/3) : Player i will produce qi = θ/4 whenever all profiles in the history are (θ/4, θ/4), whenever one of the players deviates, produce θ/3 from that moment on. Assuming that κ = 100 and c = 10 (which gives θ = 90), this is a SPE Gδ irep for δ ≥ 0.5294 · · · . It results in (θ/4, θ/4)(θ/4, θ/4) · · · with the discounted payoffs θ2 /8. 260 Dynamic Games of Complete Information Repeated Games Infinitely Repeated Games Long-Run Average Payoff and Folk Theorems 261 Infinitely Repeated Games & Average Payoff In what follows we assume that all payoffs in the game G are positive and that S is finite! Let τ = (τ1, τ2) be a strategy profile in the infinitely repeated game Girep such that wτ = s1 s2 · · · . Definition 80 We define a long-run average payoff for player i by u avg i (τ) = lim sup T→∞ 1 T T t=1 ui(st ) (Here lim sup is necessary because τi may cause non-existence of the limit.) The lon-run average payoff u avg i (τ) is well-defined if the limit u avg i (τ) = limT→∞ 1 T T t=1 ui(st ) exists. Given a strategic-form game G, we denote by G avg irep the infinitely repeated game based on G together with the long-run average payoff. 262 Infinitely Repeated Games & Average Payoff Definition 81 A strategy profile τ is a Nash equilibrium if u avg i (τ) is well-defined for all i ∈ N, and for every i and every τi we have that u avg i (τi, τ−i) ≥ u avg i (τi , τ−i) (Note that we demand existence of the defining limit of u avg i (τi, τ−i) but the limit does not have to exist for u avg i (τi , τ−i).) Moreover, τ = (τ1, τ2) is a SPE in G avg irep if for every history h we have that (τh 1 , τh 2 ) is a Nash equilibrium. 263 Example Consider the infinitely repeated game based on Prisoner’s dilemma: C S C −5, −5 0, −20 S −20, 0 −1, −1 The grim trigger profile (τ1, τ2) where τi(s1 · · · sT ) =    S T = 0 S s = (S, S) for all 1 ≤ ≤ T C otherwise is a SPE which gives the long-run average payoff −1 to each player. The intuition behind the grim trigger works as for the discounted payoff: Whenever a player i deviates, the player −i starts playing C for which the best response of player i is also C. So we obtain (S, S) · · · (S, S)(X, Y)(C, C)(C, C) · · · (here (X, Y) is either (C, S) or (S, C) depending on who deviates). Apparently, the long-run average payoff is −5 for both players, which is worse than −1. 264 Example Consider the infinitely repeated game based on Prisoner’s dilemma: C S C −5, −5 0, −20 S −20, 0 −1, −1 However, other payoffs can be supported by NE. Consider e.g. a strategy profile (τ1, τ2) such that Both players cyclically play as follows: 9 times (S, S) once (S, C) If one of the players deviates, then, from that moment on, both play (C, C) forever. Then (τ1, τ2) is also SPE. Apparently, u avg 1 (τ1, τ2) = 9 10 · (−1) + (−20)/10 = −29/10 and u avg 1 (τ1, τ2) = 9 10 (−1) = −9/10. Player 2 gets better payoff than from the Pareto optimal profile (S, S)! 265 Outline of the Folk Theorems The previous examples suggest that other (possibly all?) convex combinations of payoffs may be obtained by means of Nash equilibria. This observation forms a basis for a bunch of theorems, collectively called Folk Theorems. No author is listed since these theorems had been known in games community long before they were formalized. In what follows we prove several versions of Folk Theorem concerning achievable payoffs for repeated games. Ordered by increasing technical and conceptual difficulty, we consider the following variants: Long-run average payoffs & SPE Discounted payoffs & SPE Long-run average payoffs & Nash equilibria 266 Folk Theorems – Feasible Payoffs Definition 82 We say that a vector of payoffs v = (v1, v2) ∈ R2 is feasible if it is a convex combination of payoffs for pure strategy profiles in G with rational coefficients, i.e., if there are rational numbers βs, here s ∈ S, satisfying βs ≥ 0 and s∈S βs = 1 such that for both i ∈ {1, 2} holds vi = s∈S βs · ui(s) We assume that there is m ∈ N such that each βs can be written in the form βs = γs/m. The following theorems can be extended to a notion of feasible payoffs using arbitrary, possibly irrational, coefficients βs in the convex combination. Roughly speaking, this follows from the fact that each real number can be approximated with rational numbers up to an arbitrary error. However, the proofs are technically more involved. 267 Folk Theorems – Long-Run Average & SPE Theorem 83 Let s∗ be a pure strategy Nash equilibrium in G and let v = (v1, v2) be a feasible vector of payoffs satisfying vi ≥ ui(s∗ ) for both i ∈ {1, 2}. Then there is a strategy profile τ = (τ1, τ2) in Girep such that τ is a SPE in G avg irep u avg i (τ) = vi for i ∈ {1, 2} Proof: Consider a strategy profile τ = (τ1, τ2) in Girep which gives the following behavior: 1. Unless one of the players deviates, the players play cyclically all profiles s ∈ S so that each s is always played for γs rounds. 2. Whenever one of the players deviates, then, from that moment on, each player i plays s∗ i . It is easy to see that u avg i (τ) = vi. We verify that τ is SPE. 268 Folk Theorems – Long-Run Average & SPE Fix a history h, we show that τh = (τh 1 , τh 2 ) is a NE in G avg irep . If h does not contain any deviation from the cyclic behavior 1., then τh continues according to 1., thus u avg i (τh ) = vi. If h contains a deviation from 1., then wτh = s∗ s∗ · · · and thus u avg i (τh ) = ui(s∗ ). Now if a player i deviates to ¯τh i from τh i in G avg irep , then w(¯τh i ,τh −i ) = (s1 i , s−i)(s2 i , s∗ −i)(s3 i , s∗ −i) · · · where s1 i , s2 i , . . . are strategies of Si and s−i is a strat. of S−i. However, then u avg i (¯τh i , τh −i ) ≤ ui(s∗ ) ≤ vi since s∗ is a Nash equilibrium and thus ui(sk i , s∗ −i ) ≤ ui(s∗ ) for all k ≥ 1. Intuitively, player −i punishes player i by playing s∗ −i . 269 Folk Theorems – Discounted Payoffs & SPE Theorem 84 Let s∗ be a pure strategy Nash equilibrium in G and let v = (v1, v2) be a feasible payoff satisfying vi > ui(s∗ ) for both i ∈ {1, 2}. Then there is a strategy profile τ = (τ1, τ2) in Girep and δ < 1 such that τ is a SPE in Gδ irep for every δ ∈ [δ, 1) and limδ →1 uδ i (τ) = vi. Proof: The following claim allows us to reduce the discounted payoff to the long-run-average. Claim 5 Let τ be a well-defined strategy profile. Then lim δ→1− uδ i (τ) = u avg i (τ) Now to prove Theorem 84, consider the strategy profile τ = (τ1, τ2) in Girep from the proof of Theorem 83. We check the one-shot deviation property in Gδ irep for δ close to 1. 270 Folk Theorems – Discounted Payoffs & SPE Fix a history h and consider τh = (τh 1 , τh 2 ). If h does not contain any deviation from 1., then both players follow 1., and uδ i (τh ) is close to u avg i (τh ) = vi for δ close to 1. If h contains any deviation from 1., then wτh = s∗ s∗ · · · and uδ i (τh ) = ui(s∗ ). Now assume, w.l.o.g., that player 1 deviates exactly after h, which gives a strategy ¯τh 1 differing from τh 1 only on h. Thus w(¯τh 1 ,τh 2 ) = (s1 , s2 )s∗ s∗ · · · where s1 is a strategy of S1 and s2 is either the next step in the cyclic behavior described by 1. (if h follows 1.), or equal to s∗ 2 (h does not follow 1.) Note that for δ close to 1, we have that uδ i (¯τh i , τh −i ) is close to u avg i (¯τh i , τh −i ) = ui(s∗ ). If h follows 1., then uδ 1 (τh ) is close to v1 which is greater than u1(s∗ ) to which uδ 1 (¯τh 1 , τh 2 ) is close. If h does not follow 1., then s2 = s∗ 2 (players punish due to a deviation in h), and thus uδ 1 (¯τh 1 , τh 2 ) ≤ u1(s∗ ) = uδ 1 (τh ). 271 Folk Theorems – Individually Rational Payoffs Definition 85 v = (v1, v2) ∈ R2 is individually rational if for both i ∈ {1, 2} holds vi ≥ min s−i ∈S−i max si ∈Si ui(si, s−i) That is, vi is at least as large as the value that player i may secure by playing best responses to the most hostile behavior of player −i. Example: m f r M 4, 4 −1, 5 3, 0 F 5, −1 1, 1 0, 0 R 0, 3 0, 0 2, 2 Here any (v1, v2) such that v1 ≥ 2 and v2 ≥ 1 is individually rational. 272 Folk Theorems – Long-Run Average & NE Theorem 86 Let v = (v1, v2) be a feasible and individually rational vector of payoffs. Then there is a strategy profile τ = (τ1, τ2) in Girep such that τ is a Nash equilibrium in G avg irep u avg i (τ) = vi for i ∈ {1, 2} Proof: It suffices to use a slightly modified strategy profile τ = (τ1, τ2) in Girep from Theorem 83: Unless one of the players deviates, the players play cyclically all profiles s ∈ S so that each s is always played for γs rounds. Whenever a player i deviates, the opponent −i plays a strategy smin −i ∈ argmins−i ∈S−i maxsi ∈Si ui(si, s−i). It is easy to see that u avg i (τ) = vi. If a player i deviates, then his long-run average payoff cannot be higher than mins−i ∈S−i maxsi ∈Si ui(si, s−i) ≤ vi, so τ is a NE. 273 Folk Theorems – Long-Run Average & NE Theorem 87 If a strategy profile τ = (τ1, τ2) is a NE in G avg irep , then u avg 1 (τ), u avg 2 (τ) is individually rational. Proof: Suppose that u avg 1 (τ), u avg 2 (τ) is not individually rational. W.l.o.g. assume that u avg 1 (τ) < mins2∈S2 maxs1∈S1 u1(s1, s2). Now let us consider a new strategy ¯τ1 such that for an arbitrary history h the pure strategy ¯τ1(h) is a best response to τ2(h). But then, for every history h, we have u1(¯τ1(h), τ2(h)) ≥ min s2∈S2 max s1∈S1 u1(s1, s2) > u avg 1 (τ) So clearly u avg 1 (¯τ1, τ2) > u avg 1 (τ) which contradicts the fact that (τ1, τ2) is a NE. Note that if irrational convex combinations are allowed in the definition of feasibility, then vectors of payoffs for Nash equilibria in G avg irep are exactly feasible and individually rational vectors of payoffs. Indeed, the coefficients βs in the definition of feasibility are exactly frequencies with which the individual profiles of S are played in the NE. 274 Folk Theorems – Summary We have proved that "any reasonable" (i.e. feasible and individually rational) vector of payoffs can be justified as payoffs for a Nash equilibrium in G avg irep (where the future has "an infinite weight"). Concerning SPE, we have proved that any feasible vector of payoffs dominating a Nash equilibrium in G can be justified as payoffs for SPE in G avg irep . This result can be generalized to arbitrary feasible and strictly individually rational payoffs by means of a more demanding construction. For discounted payoffs, we have proved that an arbitrary feasible vector of payoffs strictly dominating a Nash equilibrium in G can be approximated using payoffs for SPE in Gδ irep as δ goes to 1. Even this result can be extended to feasible and strictly individually rational payoffs. For a very detailed discussion of Folk Theorems see "A Course in Game Theory" by M. J. Osborne and A. Rubinstein. 275 Summary of Extensive-Form Games We have considered extensive-form games (i.e., games on trees) with perfect information with imperfect information with chance nodes (and both perfect and imperfect information) We have considered pure, mixed and behavioral strategies. We have considered Nash equilibria (NE) and subgame perfect equilibria (SPE) in pure and behavioral strategies. 276 Summary of Extensive-Form Games (Cont.) For perfect information we have shown that mixed and behavioral strategies are equivalent there is a pure strategy SPE in both pure as well as behavioral strategies SPE can be computed using backward induction in polynomial time For imperfect information we have shown that mixed and behavioral strategies are not equivalent in general (but they are equivalent for games with perfect recall) backward induction can be used to propagate values through "perfect information nodes", but "imperfect information parts" have to be solved by different means solving imperfect information games is at least as hard as solving games in strategic-form; however, even in the zero-sum case, most decision problems are NP-hard (for details see the lecture). Chance nodes do not interfere with any of the above results. 277 Summary of Extensive-Form Games (Cont.) Finally, we discussed repeated games. We considered both, finitely as well as infinitely repeated games. For finitely repeated games we considered the average payoff and discussed existence of pure strategy NE and SPE with respect to existence of NE in the original strategic-form game. For infinitely repeated games we considered both discounted payoff: We have proved that one-shot deviation property is equivalent to SPE "grim trigger" strategy profiles can be used to implement any vector of payoffs strictly dominating payoffs for a Nash equilibrium in the original strategic-form game (Simple Folk Theorem) long-run average payoff: We have proved that all feasible and individually rational vectors of payoffs can be achieved by Nash equilibria (a variant of grim trigger) 278 Games of INcomplete Information Bayesian Games Auctions 279 Auctions The (General) problem: How to allocate (discrete) resources among selfish agents in a multi-agent system? Auctions provide a general solution to this problem. As such, auctions have been heavily used in real life, in consumer, corporate, as well as government settings: eBay, art auctions, wine auctions, etc. advertising (Google adWords) governments selling public resources: electromagnetic spectrum, oil leases, etc. · · · Auctions also provide a theoretical framework for understanding resource allocation problems among self-interested agents: Formally, an auction is any protocol that allows agents to indicate their interest in one or more resources and that uses these indications to determine both the resource allocation and payments of the agents. 280 Auctions: Taxonomy Auctions may be used in various settings depending on the complexity of the resource allocation problem: Single-item auctions: Here n bidders (players) compete for a single indivisible item that can be allocated to just one of them. Each bidder has his own private value of the item in case he wins (gets zero if he loses). Typically (but not always) the highest bid wins. How much should he pay? Multiunit auctions: Here a fixed number of identical units of a homogeneous commodity are sold. Each bidder submits both a number of units he demands and a unit price he is willing to pay. Here also the highest bidders typically win, but it is unclear how much they should pay (pay-as-bid vs uniform pricing) Combinatorial auctions: Here bidders compete for a set of distinct goods. Each player has a valuation function which assigns values to subsets of the set (some goods are useful only in groups etc.) Who wins and what he pays? (We mostly concentrate on the single-item auctions.) 281 Single Unit Auctions There are many single-item auctions, we consider the following well-known versions: open auctions: The English Auction: Often occurs in movies, bidders are sitting in a room (by computer or a phone) and the price of the item goes up as long as someone is willing to bid it higher. Once the last increase is no longer challenged, the last bidder to increase the price wins the auction and pays the price for the item. The Dutch Auction: Opposite of the English auction, the price starts at a prohibitively high value and the auctioneer gradually drops the price. Once a bidder shouts "buy", the auction ends and the bidder gets the item at the price. sealed-bid-auction: k-th price Sealed-Bid Auction: Each bidder writes down his bid and places it in an envelope; the envelopes are opened simultaneously. The highest bidder wins and then pays the k-th maximum bid. (In a reverse auction it is the k-the minimum.) The most prominent special cases are The First-Price Auction and The Second-Price Auction. 282 Single Unit Auctions (Cont.) Observe that the English auction is essentially equivalent to the second price auction if the increments in every round are very small. There exists a "continuous" version, called Japanese auction, where the price continuously increases. Each bidder may drop out at any time. The last one who stays gets the item for the current price (which is the dropping price of the "second highest bid"). similarly, the Dutch auction is equivalent to the first price auction. Note that the bidder with the highest bid stops the decrement of the price and buys at the current price which corresponds to his bid. Now the question is, which type of auction is better? 283 Objectives The goal of the bidders is clear: To get the item at as low price as possible (i.e., they maximize the difference between their private value and the price they pay) We consider self-interested non-communicating bidders that are rational and intelligent. There are at least two goals that may be pursued by the auctioneer (in various settings): Revenue maximization This may lead to auctions that do not always sell the item to the highest bid Incentive compatibility: We want the bidders to spontaneously bid their true value of the item This means, that such an auction cannot be strategically manipulated by lying. 284 Auctions vs Games Consider single-item sealed-bid auctions as strategic form games: G = (N, (Bi)i∈N , (ui)i∈N) where The set of players N is the set of bidders Bi = [0, ∞) where each bi ∈ Bi corresponds to the bid bi (We follow the standard notation and use bi to denote pure strategies (bids)) To define ui, we assume that each bidder has his own private value vi of the item, then given bids b = (b1, . . . , bn) : First Price: ui(b) =    vi − bi if bi > maxj i bj 0 otherwise Second Price: ui(b) =    vi − maxj i bj if bi > maxj i bj 0 otherwise Is this model realistic? Not really, usually, the bidders are not perfectly informed about the private values of the other bidders. Can we use (possibly imperfect information) extensive-form games? 285 Incomplete Information Games A (strict) incomplete information game is a tuple G = (N, (Ai)i∈N , (Ti)i∈N , (ui)i∈N) where N = {1, . . . , n} is a set of players, Each Ai is a set of actions available to player i, We denote by A = n i=1 Ai the set of all action profiles a = (a1, . . . , an). Each Ti is a set of possible types of player i, Denote by T = n i=1 Ti the set of all type profiles t = (t1, . . . , tn). ui is a type-dependent payoff function ui : A1 × · · · × An × Ti → R Given a profile of actions (a1, . . . , an) ∈ A and a type ti ∈ Ti, we write ui(a1, . . . , an; ti) to denote the corresponding payoff. A pure strategy of player i is a function si : Ti → Ai. As before, we denote by Si the set of all pure strategies of player i, and by S the set of all pure strategy profiles n i=1 Si. 286 Dominant Strategies A pure strategy si very weakly dominates si if for every ti ∈ Ti the following holds: For all a−i ∈ A−i we have ui(si(ti), a−i; ti) ≥ ui(si (ti), a−i; ti) A pure strategy si weakly dominates si if for every ti ∈ Ti the following holds: For all a−i ∈ A−i we have ui(si(ti), a−i; ti) ≥ ui(si (ti), a−i; ti) and the inequality is strict for at least one a−i (Such a−i may be different for different ti.) A pure strategy si strictly dominates si if for every ti ∈ Ti the following holds: For all a−i ∈ A−i we have ui(si(ti), a−i; ti) > ui(si (ti), a−i; ti) Definition 88 si is (very weakly, weakly, strictly) dominant if it (very weakly, weakly, strictly, resp.) dominates all other pure strategies. 287 Nash Equilibrium In order to generalize Nash equilibria to incomplete information games, we use the following notation: Given a pure strategy profile (s1, . . . , sn) ∈ S and a type profile (t1, . . . , tn) ∈ T, for every player i write s−i(t−i) = (s1(t1), . . . , si−1(ti−1), si+1(ti+1), . . . , sn(tn)) Definition 89 A strategy profile s = (s1, . . . , sn) ∈ S is an ex-post-Nash equilibrium if for every t1, . . . , tn we have that (s1(t1), . . . , sn(tn)) is a Nash equilibrium in the strategic-form game defined by the ti’s. Formally, s = (s1, . . . , sn) ∈ S is an ex-post-Nash equilibrium if for all i ∈ N and all t1, . . . , tn and all ai ∈ Ai : ui(s1(t1), . . . , sn(tn); ti) ≥ ui(ai, s−i(t−i); ti) 288 Example: Single-Item Sealed-Bid Auctions Consider single-item sealed-bid auctions as strict incomplete information games: G = (N, (Bi)i∈N , (Vi)i∈N , (ui)i∈N) where The set of players N is the set of bidders Bi = [0, ∞) where each action bi ∈ Bi corresponds to the bid bi Vi = [0, ∞) where each type vi ∈ Vi corresponds to the private value vi Let vi ∈ Vi be the type of player i (i.e. his private value), then given an action profile b = (b1, . . . , bn) (i.e. bids) we define First Price: ui(b; vi) =    vi − bi if bi > maxj i bj 0 otherwise. Second Price: ui(b; vi) =    vi − maxj i bj if bi > maxj i bj 0 otherwise. Note that if there is a tie (i.e., there are k such that bk = b = maxj bj), then all players get 0. Are there dominant strategies? Are there ex-post-Nash equilibria? 289 Second-Price Auction For every i, we denote by vi the pure strategy si for player i defined by si(vi) = vi. Intuitively, such a strategy is truth telling, which means that the player bids his own private value truthfully. Theorem 90 Assume the Second-Price Auction. Then for every player i we have that vi is a weakly dominant strategy. Also, v is the unique ex-post-Nash equilibrium. Proof. Let us fix a private value vi and a bid bi ∈ Bi such that bi vi. We show that for all bids of opponents b−i ∈ B−i : ui(vi, b−i; vi) ≥ ui(bi, b−i; vi) with the strict inequality for at least one b−i. Intuitively, assume that player i bids bi against b−i and compare his payoff with the payoff he obtains by playing vi against b−i. There are two cases to consider: bi < vi and bi > vi. 290 Second-Price Auction (Cont.) Case bi < vi : We distinguish three sub-cases depending on b−i. A. If bi > maxj i bj, then ui(bi, b−i; vi) = vi − max j i bj = ui(vi, b−i; vi) Intuitively, player i wins and pays the price maxj i bj < bi. However, then bidding vi, player i wins and pays maxj i bj as well. B. If there is k i such that bk > maxj k bj, then ui(bi, b−i; vi) = 0 ≤ ui(vi, b−i; vi) Moreover, if bi < bk < vi, then we get the strict inequality ui(bi, b−i; vi) = 0 < vi − bk = ui(vi, b−i; vi) Intuitively, if another player k wins, then player i gets 0 and increasing bi to vi does not hurt. Moreover, if bi < bk < vi, then increasing bi to vi strictly increases the payoff of player i. C. If there are k such that bk = b = maxj bj, then ui(bi, b−i; vi) = 0 ≤ ui(vi, b−i; vi) Intuitively, there is a tie in (bi, b−i) and hence all players get 0. 291 Second-Price Auction (Cont.) Case bi > vi : We distinguish four sub-cases depending on b−i. A. If bi > maxj i bj > vi, then ui(bi, b−i; vi) = vi − max j i bj < 0 = ui(vi, b−i; vi) So in this case the inequality is strict. B. If bi > vi ≥ maxj i bj, then ui(bi, b−i; vi) = vi − max j i bj = ui(vi, b−i; vi) Note that this case also covers vi = maxj i bj where decreasing bi to vi causes a tie with zero payoff for player i. C. If there is k i such that bk > maxj k bj > vi, then ui(bi, b−i; vi) = 0 = ui(vi, b−i; vi) D. If there are k k such that bk = bk = maxj bj > vi, then ui(bi, b−i; vi) = 0 = ui(vi, b−i; vi) 292 First-Price Auction Consider the First-Price Auction. Here the highest bidder wins and pays his bid. Let us impose a (reasonable) assumption that no player bids more than his private. Question: Are there any dominant strategies? Answer: No, to obtain a contradiction, assume that si is a very weakly dominant strategy. Intuitively, if player i wins against some bids of his opponents, then his bid is strictly higher than bids of all his opponents. Thus he may slightly decrement his bid and still win with a better payoff. Formally, assume that all opponents bid 0, i.e., bj = 0 for all j i, and consider vi > 0. If si(vi) > 0, then ui(si(vi), b−i; vi) = vi − si(vi) < vi − si(vi)/2 = ui(si(vi)/2, b−i; vi) If si(vi) = 0, then ui(si(vi), b−i; vi) = 0 < vi/2 = ui(vi/2, b−i; vi) Hence, si cannot be weakly dominant. 293 First-Price Auction (Cont.) Question: Is there a pure strategy Nash equilibrium? Answer: No, assume that (s1, . . . , sn) is a Nash equilibrium. If there are v1, . . . , vn such that some player i wins, i.e., his bid si(vi) satisfies si(vi) > maxj i sj(vj), then ui(si(vi), s−i(v−i); vi) = vi − si(vi) < vi − (si(vi) − ε) = ui(si(vi) − ε, s−i(v−i); vi) for ε > 0 small enough to satisfy si(vi) − ε > maxj i sj(vj) (i.e., player i may help himself by decreasing the bid a bit) Assume that for no v1, . . . , vn there is a winner (this itself is a bit weird). Consider 0 < v1 < · · · < vn. Since there is no winner, there are two players i, j such that i < j satisfying sj(vj) = si(vi) ≥ max s (v ) But then, due to our assumption, sj(vj) = si(vi) ≤ vi < vj and thus uj(sj(vj), s−j(v−j); vj) = 0 < vj − (sj(vj) + ε) = uj(sj(vj) + ε, s−j(v−j); vj) for ε > 0 small enough to satisfy sj(vj) + ε < vj. (i.e., player j can help himself by increasing his bid a bit) 294 Summary Second Price Auction: There is an ex-post Nash equilibrium in weakly dominant strategies It is incentive compatible (players are self-motivated to bid their private values) First Price Auction: There are neither dominant strategies, nor ex-post Nash equilibria Question: Can we modify the model in such a way that First Price Auction has a solution? Answer: Yes, give the players at least some information about private values of other players. 295 Bayesian Games A Bayesian Game G = (N, (Ai)i∈N , (Ti)i∈N , (ui)i∈N , P) where (N, (Ai)i∈N , (Ti)i∈N , (ui)i∈N) is a strict incomplete information game and P is a distribution on types, i.e., N = {1, . . . , n} is a set of players, Ai is a set of actions available to player i, Ti is a set of possible types of player i, Recall that T = n i=1 Ti is the set of type profiles, and that A = n i=1 Ai is the set of action profiles. ui is a type-dependent payoff function ui : A1 × · · · × An × Ti → R P is a (joint) probability distribution over T called common prior. Formally, P is a probability measure over an appropriate measurable space on T. However, I will not go into measure theory and consider only two special cases: finite T (in which case P : T → [0, 1] so that t∈T P(t) = 1) and Ti = R for all i (in which case I assume that P is determined by a (joint) density function p on Rn ). 296 Bayesian Games: Strategies & Payoffs A play proceeds as follows: First, a type profile (t1, . . . , tn) ∈ T is randomly chosen according to P. Then each player i learns his type ti. (It is a common knowledge that every player knows his own type but not the types of other players.) Each player i chooses his action based on ti. Each player receives his payoff ui(a1, . . . , an; ti). A pure strategy for player i is a function si : Ti → Ai. As before, we use S to denote the set of pure strategy profiles. 297 Properties We assume that ui depends only on ti and not on t−i. This is called private values model and can be used to model auctions. This model can be extended to common values by using ui(a1, . . . , an; t1, . . . , tn). We assume the common prior P. This means that all players have the same beliefs about the type profile. This assumption is rather strong. More general models allow each player to have his own individual beliefs about types ... his own beliefs about beliefs about types .... beliefs about beliefs about beliefs about types ..... (we get an infinite hierarchy) There is a generic result of Harsanyi saying that the hierarchy is not necessary: It is possible to extend the type space in such a way that each player’s "extended type" describes his original type as well as all his beliefs. (This does not mean that common prior suffices.) 298 Example: Battle of Sexes Assume that player 1 may suspect that player 2 is angry with him/her (the choice is yours) but cannot be sure. In other words, there are two types of player 2 giving two different games. Formally we have a Bayesian Game G = (N, (Ai)i∈N , (Ti)i∈N , (ui)i∈N , P) where N = {1, 2} A1 = A2 = {F, O} T1 = {t1} and T2 = {t1 2 , t2 2 } The payoffs are given by t1 2 t2 2 t1 : F O F 2, 1 0, 0 O 0, 0 1, 2 F O F 2, 0 0, 2 O 0, 1 1, 0 P(t1 2 ) = P(t2 2 ) = 1 2 299 Example: Single-Item Sealed-Bid Auctions Consider single-item sealed-bid auctions as Bayesian games: G = (N, (Bi)i∈N , (Vi)i∈N , (ui)i∈N , P) where The set of players N = {1, . . . , n} is the set of bidders Bi = [0, ∞) where each action bi ∈ Bi corresponds to the bid Vi = R where each type vi corresponds to the private value Let vi ∈ Vi be the type of player i (i.e. his private value), then given an action profile b = (b1, . . . , bn) (i.e. bids) we define First Price: ui(b; vi) =    vi − bi if bi > maxj i bj 0 otherwise. Second Price: ui(b; vi) =    vi − maxj i bj if bi > maxj i bj 0 otherwise. P is a probability distribution of the private values such that P(v ∈ [0, ∞)n ) = 1. For example, we may (and will) assume that each vi is chosen independently and uniformly from [0, vmax] where vmax is a given number. Then P is uniform on [0, vmax]n . 300 Finite-Type Bayesian Games: Payoffs For now, let us assume that each player has only finitely many types, i.e., T is finite. Given a type profile t = (t1, . . . , tn), we denote by P(t−i | ti) the conditional probability that the opponents of player i have the type profile t−i conditioned on player i having ti, i.e., P(t−i | ti) := P(ti, t−i) t−i P(ti, t−i ) Intuitively, P(t−i | ti) is the maximum information player i may squeeze out of P about possible types of other players once he learns his own type ti. Given a pure strategy profile s = (s1, . . . , sn) and a type ti ∈ Ti of player i the expected payoff for player i is ui(s; ti) = t−i∈T−i P(t−i | ti) · ui(s1(t1), . . . , sn(tn); ti) (this is the conditional expectation of ui assuming the type ti of player i) 301 Example: Battle of Sexes t1 2 t2 2 t1 : F O F 2, 1 0, 0 O 0, 0 1, 2 F O F 2, 0 0, 2 O 0, 1 1, 0 P(t1 2 ) = P(t2 2 ) = 1 2 Consider strategies s1 of player 1 and s2 of player 2 defined by s1(t1) = F s2(t1 2 ) = F and s2(t2 2 ) = O Then u1(s1, s2; t1) = 1 2 · 2 + 1 2 · 0 = 1 u2(s1, s2; t1 2 ) = 1 and u2(s1, s2; t2 2 ) = 2 302 Infinite-Type Bayesian Games: Payoffs Now assume that for each player i we have Ti = R and thus that T = Rn . The concrete type is randomly chosen according to P, denote by t = (t1, . . . , tn) the corresponding random vector with distribution P (each ti is a random variable giving a type of player i). Assume that the type t is absolutely continuous which means that there is a (joint) density function p such that for all rectangles R = [a1, b1] × · · · × [an, bn] P[t ∈ R] = b1 a1 · · · bn an p(t1, . . . , tn)dtn · · · dt1 Let pi be the marginal density function of ti, i.e., pi(ti) = T−i p(ti, t−i)dt−i The conditional density of t−i = (t1, . . . , ti−1, ti+1, . . . , tn) conditioned on ti = ti where pi(ti) > 0 is p(t−i | ti) = p(t)/pi(ti) (Here t = (t1, . . . , tn) is a type profile.) 303 Infinite-Type Bayesian Games: Payoffs Given a pure strategy profile s = (s1, . . . , sn) and a type ti ∈ Ti of player i, the expected payoff for player i is ui(s; ti) = T−i ui(s1(t1), . . . , sn(tn); ti) p(t−i | ti) dt−i Example: First-Price Auction Consider the first-price auction as a Bayesian game where the types of players are chosen uniformly and independently from [0, vmax]. Consider a pure strategy profile v = (v1/2, . . . , vn/2) (i.e., each player i plays vi/2). What is ui(v; vi) ? ui(v; vi) = P(player i wins) · vi/2 + P(player i loses) · 0 = P(all players except i bid less than vi/2) · vi/2 = vi 2vmax n−1 · vi/2 = vn i 2nvn−1 max 304 Risk Aversion We assume that players maximize their expected payoff. Such players are called risk neutral. In general, there are three kinds of players that can be described using the following experiment. A player can choose between two possibilities: Either get $50 surely, or get $100 with probability 1 2 and 0 with probability 1 2 . risk neutral person has no preference risk averse person prefers the first alternative risk seeking person prefers the second one 305 Dominance and Nash Equilibria A pure strategy si weakly dominates si if for every ti ∈ Ti the following holds: For all s−i ∈ S−i we have ui(si, s−i; ti) ≥ ui(si , s−i; ti) and the inequality is strict for at least one s−i. The other modes of dominance are defined analogously. Dominant strategies are defined as usual. Definition 91 A pure strategy profile s = (s1, . . . , sn) ∈ S in the Bayesian game is a pure strategy Bayesian Nash equilibrium if for each player i and each type ti ∈ Ti of player i and every strategy si ∈ Si we have that ui(si, s−i; ti) ≥ ui(si , s−i; ti) 306 Example: Battle of Sexes t1 2 t2 2 t1 : F O F 2, 1 0, 0 O 0, 0 1, 2 F O F 2, 0 0, 2 O 0, 1 1, 0 P(t1 2 ) = P(t2 2 ) = 1 2 Use the following notation: (X, (Y, Z)) means that player 1 plays X ∈ {F, O}, and player 2 plays Y ∈ {F, O} if his/her type is t1 2 and Z ∈ {F, O} otherwise. Are there pure strategy Bayesian Nash equilibria? (F, (F, O)) is a Bayesian NE. Even though O is preferred by player 2, the outcome (O, O) cannot occur with a positive probability in any BNE. To ever meet at the opera, player 1 needs to play O. The unique best response of player 2 to O is (O, F) But (O, (O, F)) is not a BNE: The expected payoff of player 1 at (O, (O, F)) is 1 2 The expected payoff of player 1 at (F, (O, F)) is 1 307 Second Price Auction Consider the second-price sealed-bid auction as a Bayesian game where the types of players are chosen according to an arbitrary distribution. Proposition 7 In a second-price sealed-bid auction, with any probability distribution P, the truth revealing profile of bids, i.e., v = (v1, . . . , vn), is a weakly dominant strategy profile. Proof. The exact same proof as for the strict incomplete information games. Indeed, we do not need to assume that the players have a common prior for this! 308 First Price Auction Consider the first-price sealed-bid auction as a Bayesian game with some prior distribution P. Note that bidding truthfully does not have to be a dominant strategy. For example, if player i knows that (with high probability) his value vi is much larger than maxj i vj, he will not waste money and bid less than vi. So is there a pure strategy Bayesian Nash equilibrium? Proposition 8 Assume that for all players i the type of player i is chosen independently and uniformly from [0, vmax]. Consider a pure strategy profile s = (s1, . . . , sn) where si(vi) = n−1 n vi for every player i and every value vi. Then s is a Bayesian Nash equilibrium. Proof. We show that si(vi) = n−1 n vi is the best response to s−i for all i. Let us fix i and consider a pure strategy si of player i. Fix vi and define bi = si (vi). We show (see the greenboard) that bi = n−1 n vi maximizes ui(bi, s−i; vi). This holds for all vi, and thus si = si is the best response to s−i. 309 First Price Auction (Cont.) More generally, assume only that the private values vi are identically and independently distributed on [vmin, vmax] (this is called independent private values model). Let F(x) be the cumulative distribution function of the private value (for each player). Let us restrict to strictly increasing strategies. Note that this restriction is quite reasonable, intuitively it means, that the higher the private value, the higher is the bid. Then one may show that there is a symmetric Bayesian Nash equilibrium (s1, . . . , sn) where each si is defined by si(vi) = vi − vmax vmin [F(vi)]n−1 dx [F(vi)]n−1 That is, in particular, the bid is always smaller than the private value. 310 Expected Revenue Consider the first and second price sealed-bid auctions. For simplicity, assume that the type of each player is chosen independently and uniformly from [0, 1]. What is the expected revenue of the auctioneer from these two auctions when the players play the corresponding Bayesian NE? In the first-price auction, players bid n−1 n vi. Thus the probability distribution of the revenue is F(x) = P(max j n − 1 n vj ≤ x) = P(max j vj ≤ nx n − 1 ) = nx n − 1 n It is straightforward to show that then the expected maximum bid in the first-price auction (i.e., the revenue) is n−1 n+1 . In the second-price auction, players bid vi. However, the revenue is the expected second largest value. Thus the distribution of the revenue is F(x) = P(max j vj ≤ x) + n i=1 P(vi > x and for all j i, vj ≤ x) Amazingly, this also gives the expectation n−1 n+1 . 311 Revenue Equivalence (Cont.) The result from the previous slide is a special case of a rather general revenue equivalence theorem, first proved by Vickrey (1961) and then generalized by Myerson (1981). Both Vickrey and Myerson were awarded Nobel Prize in economics for their contribution to the auction theory. Theorem 92 (Revenue Equivalence) Assume that each of n risk-neutral players has independent private values drawn from a common cumulative distribution function F(x) which is continuous and strictly increasing on an interval [vmin, vmax] (the probability of vi [vmin, vmax] is zero). Then any efficient auction mechanism in which any player with value vmin has an expected payoff zero yields the same expected revenue. Here efficient means that the auction has a symmetric and increasing Bayesian Nash equilibrium and always allocates the item to the player with the highest bid. 312 Bayesian Games – Nature & Common Values A Bayesian Game (with nature and common values) consists of a set of players N = {1, . . . , n}, a set of states of nature Ω, a set of actions Ai available to player i, a set of possible types Ti of player i, a type function τi : Ω → Ti assigning a type of player i to every state of nature, a payoff function ui for every player i ui : A1 × · · · × An × Ω → R a probability distribution P over Ω called common prior. As before, a pure strategy for player i is a function si : Ti → Ai. 313 Bayesian Games – Nature & Common Values Given a pure strategy si of player i and a state of nature ω ∈ Ω, we denote by si(ω) the action si(τi(ω)) chosen by player i when the state is ω. We denote by s(ω) the action profile (s1(τ1(ω)), . . . , sn(τn(ω))). Given a set A ⊆ Ω of states of nature and a type ti ∈ Ti of player i, we denote by P(A | ti) the conditional probability of A conditioned on the event that player i has type ti. We define the expected payoff for player i by ui(s1, . . . , sn; ti) = Eω∼P [ui(s(ω); ω) | τi(ω) = ti] Here the right hand side is the expected payoff of player i with respect to the probability distribution P conditioned on his type ti. Definition 93 A pure strategy profile s = (s1, . . . , sn) ∈ S in the Bayesian game is a pure strategy Bayesian Nash equilibrium if for each player i and each type ti ∈ Ti and every pure strategy si of player i we have that ui(si, s−i; ti) ≥ ui(si , s−i; ti) 314 Adverse Selection A firm C is taking over a firm D. The true value d of D is not known to C, assume that it is uniformly distributed on [0, 1]. This is of course a bit artificial, more precise analysis can be done with a different distribution. It is known that D’s value will flourish under C’s ownership: it will rise to λd where λ > 1. All of the above is a common knowledge. Let us model the situation as a Bayesian game (with common values). 315 Adverse Selection (Cont.) N = {C, D}, Ω = [0, 1] where d ∈ Ω expresses the true value of D, AC = [0, 1] where c ∈ AC expresses how much is the firm C willing to pay for the firm D, AD = {yes, no} (sell or not to sell), TC = {t1} (a trivial type) and TD = Ω = [0, 1], τC (d) = t1 and τD(d) = d for all d ∈ Ω, uC (c, yes; d) = λd − c and uC (c, no; d) = 0 uD(c, yes; d) = c and uD(c, no; d) = d, P is the uniform distribution on [0, 1]. Is there a BNE? 316 Adverse Selection (Cont.) What is the best response of firm D to an action c ∈ [0, 1] of firm C? Such a best response must satisfy: say yes if d < c say no if d > c So the expected value of the firm D (in the eyes of C) assuming that D says yes is c/2. Indeed, assuming that the firm D says yes, the value d is uniformly distributed between 0 and c, so the average is c/2. Therefore, the expected payoff of C is λ(c/2) − c = c λ 2 − 1 which is negative for λ ≤ 2. So it is not profitable (on average) for the firm C to buy unless the target D more than doubles in value after the takeover! 317 Committe Voting Consider a very simple model of a jury made up of two players (jurors) who must collectively decide whether to acquit (A), or to convict (C) a defendant who can be either guilty (G) or innocent (I). Each player casts a sealed vote (A or C), and the defendant is convicted if and only if both vote C. A prior probability that the defendant is guilty is q > 1 2 (i.e., P(G) = q) and is common knowledge. Assume that each player gets payoff 1 for a right decision and 0 for incorrect decision. We consider risk neutral players who maximize their expected payoff. We may model this situation using a strategic-form game: A C A 1 − q, 1 − q 1 − q, 1 − q C 1 − q, 1 − q q, q Is there a dominant strategy? 318 Committee Voting (Cont.) Let’s make things a bit more complicated. Assume that each juror has a different expertise and, when observing the evidence, gets a private signal ti ∈ {θG, θI} that contains a valuable piece of information. That is if the defendant is guilty, θG is more probable, if innocent, θI is more probable. For i ∈ {1, 2} : P(ti = θG | G) = P(ti = θI | I) = p > 1 2 P(ti = θG | I) = P(ti = θI | G) = 1 − p < 1 2 We also assume that the players get their signals independently conditional on the defendants condition: P(t1 = θX ∧ t2 = θY | Z) = P(t1 = θX | Z) · P(t2 = θY | Z) for all X, Y, Z ∈ {G, I}. 319 Committe Voting (Cont.) We obtain a Bayesian game: N = {1, 2} A1 = A2 = {A, C} Ω = {(Z, θX , θY ) | Z, X, Y ∈ {G, I}} T1 = T2 = {θG, θI} τ1(Z, θX , θY ) = θX and τ2(Z, θX , θY ) = θY For arbitrary U, V ∈ {A, C} and X, Y ∈ {G, I} we have that ui(U, V; (G, θX , θY )) =    1 if U = V = C, 0 otherwise. ui(U, V; (I, θX , θY )) =    0 if U = V = C, 1 otherwise. P(Z, θX , θY ) = P(Z)P(t1 = θX | Z)P(t2 = θY | Z) I.e., P(Z, θX , θY ) is the probability of choosing (Z, θX , θY ) as follows: First, Z ∈ {G, I} is randomly chosen (Z = G has probability q). Then, conditioned on Z, θX and θY are independently chosen. 320 Committee Voting (Cont.) Now consider just one player i. If the player i would be able to decide by himself, how does his decision depend on his type ti ∈ {θG, θI}? If ti = θG, then how probable is that the defendant is guilty? P(G | ti = θG) = P(ti = θG | G)P(G) P(ti = θG) = pq qp + (1 − q)(1 − p) > q so that the posterior probability of G is even higher. If θI is received, then how probable is that the defendant is guilty? P(G | ti = θI) = P(ti = θI | G)P(G) P(ti = θI) = (1 − p)q q(1 − p) + (1 − q)p < q which means, clearly, that the player is less sure about G. In particular, player i chooses I instead of G if P(G | ti = θI) = q(1 − p) q(1 − p) + (1 − q)p < 1 2 which holds iff p > q. 321 Committee Voting (Cont.) So if p > q each player would choose to vote according to his signal. Denote by XY the strategy of player i in which he chooses X if ti = θG and Y if ti = θI. Question: Is (CA, CA) BNE assuming that p > q ? u1(CA, CA; θI) = P(I | t1 = θI) = P(I | t1 = θI ∧ t2 = θG)P(t2 = θG | t1 = θI) + P(I | t1 = θI ∧ t2 = θI)P(t2 = θI | t1 = θI) u1(CC, CA; θI) = P(G ∧ t2 = θG | t1 = θI) + P(I ∧ t2 = θI | t1 = θI) = P(G | t1 = θI ∧ t2 = θG)P(t2 = θG | t1 = θI) + P(I | t1 = θI ∧ t2 = θI)P(t2 = θI | t1 = θI) Note that the blue expressions are equal, so the payoff depends only on the red ones, where player 2 is assumed to consider the defendant guilty. Intuitively, if player 2 chooses A, then the decision of player 1 does not have any impact. On the other hand, if player 2 chooses C, then the decision is, in fact, up to player 1 (we say that he is pivotal). 322 Committee Voting (Cont.) So what is the probability that the defendant is guilty assuming that the vote of player 1 counts? That is, assuming t2 = θG and t1 = θI ? P(G | t1 = θI ∧ t2 = θG) = P(t1 = θI ∧ t2 = θG | G)P(G) P(t1 = θI ∧ t2 = θG) = (1 − p)pq p(1 − p) = q > 1 2 > (1 − q) = P(I | t1 = θI ∧ t2 = θG) which means that player 1 is more convinced that the defendant is guilty contrary to the signal! This means that even though individual decision would be "innocent", taking into account that the vote should have some value gives "guilty". Hence u1(CA, CA; θI) < u1(CC, CA; θI) and thus playing CC is a better response to CA. By the way, is (CC, CA) a BNE? 323 Winner’s Curse An auction for a new oil field (of unknown size), assume only two firms competing (two players). The field is either small (worth $10 million), medium (worth $20 million), large (worth $30 million). That is, the real value v of the field satisfies v ∈ {10, 20, 30}. Assume some prior information about the size of the filed: P(v = 10) = P(v = 30) = 1 4 P(v = 20) = 1 2 The government is selling the field in the second-price sealed-bid auction, so that in the case of a tie, the winner is chosen randomly (and pays his bid). That is, in effect, in case of a tie, the payoff of each player is (v − b)/2 where v is the value, b the (common) bid. Using the same argument as for the "ordinary" second-price auction with private values one may show that playing the true private value weakly dominates all other bids. 324 Winner’s Curse (Cont.) Each of the firms performs a (free) exploration that will provide the type ti ∈ {L, H} (low or high), correlated with the size as follows: If v = 10, then t1 = t2 = L If v = 30, then t1 = t2 = H If v = 20, then for i ∈ {1, 2}, conditioned on v = 20, the exploration results are uniformly distributed: There are four possible results, (L, L), (L, H), (H, L), (H, H), each with probability 1 4 . Given the signal ti, player i may estimate the true value of the field: P(v = 10 | ti = L) = 1 2 P(v = 10 | ti = H) = 0 P(v = 20 | ti = L) = 1 2 P(v = 20 | ti = H) = 1 2 P(v = 30 | ti = L) = 0 P(v = 30 | ti = H) = 1 2 Thus E(v | ti = L) = 1 2 10 + 1 2 20 = 15. and E(v | ti = H) = 1 2 20 + 1 2 30 = 25 325 Winner’s Curse (Cont.) Is it a good idea to bid the expected value? Define a strategy si for player i by si(L) = E(v | ti = L) si(H) = E(v | ti = H) Is (s1, s2) a Nash equilibrium? Consider t1 = L. Then player 1 bids 15. What is his expected payoff? Note that if t2 = H, then player 2 bids 25 and wins, which means that player 1 gets payoff 0. So player 1 can get a non-zero value only if t2 = L. This implies that u1(s1, s2; L) = P(v = 20 ∧ t2 = L | t1 = L) · (20 − 15)/2 + P(v = 10 ∧ t2 = L | t1 = L) · (10 − 15)/2 = P(v = 20 ∧ t2 = L | t1 = L) · 5/2 + P(v = 10 ∧ t2 = L | t1 = L) · (−5)/2 326 Winner’s Curse (Cont.) In what follows we show that P(v = 20 ∧ t2 = L | t1 = L) = 1 4 (31) P(v = 10 ∧ t2 = L | t1 = L) = 1 2 (32) which means that u1(s1, s2; L) = P(v = 20 ∧ t2 = L | t1 = L) · 5/2 + P(v = 10 ∧ t2 = L | t1 = L) · (−5)/2 = 1 4 5 2 + 1 2 (−5) 2 = −5 8 < 0 and player 1 would be better off by bidding 0 and always losing!! Intuition: Player 1 wins only if the signal of player 2 is L, which in effect means, that assuming win, the effective expected value of the field is lower than the predicted expected value. In the rest of the proof we heavily use the Bayes’ theorem and the law of total probability. 327 Winner’s Curse (Cont.) : Proof of Equation (31) P(v = 20 ∧ t2 = L | t1 = L) = = P(v = 20 ∧ t2 = L | t1 = L ∧ t2 = L) · P(t2 = L | t1 = L) + P(v = 20 ∧ t2 = L | t1 = L ∧ t2 = H) · P(t2 = H | t1 = L) = P(v = 20 | t1 = L ∧ t2 = L) · P(t2 = L | t1 = L) Here P(t2 = L | t1 = L) = = P(t2 = L | t1 = L ∧ v = 10) · P(v = 10 | t1 = L) + P(t2 = L | t1 = L ∧ v = 20) · P(v = 20 | t1 = L) = 1 · 1 2 + 1 2 · 1 2 = 3 4 (Here we used the fact that t1 and t2 are independent assuming a fixed v) We show (see next slide) that P(v = 20 | t1 = L ∧ t2 = L) = 1 3 and thus P(v = 20 ∧ t2 = L | t1 = L) = 1 3 · 3 4 = 1 4 328 Winner’s Curse (Cont.) : Proof of Equation (31) First, note that P(t1 = L ∧ t2 = L | v = 10) = 1 P(t1 = L ∧ t2 = L | v = 20) = 1 4 Now by Bayes’ theorem P(v = 20 | t1 = L ∧ t2 = L) = = [ P(t1 = L ∧ t2 = L | v = 20) · P(v = 20) ] / P(t1 = L ∧ t2 = L) = = 1 4 · 1 2 P(t1 = L ∧ t2 = L) = 1 8 · P(t1 = L ∧ t2 = L) But by the law of total probability P(t1 = L ∧ t2 = L) = = P(t1 = L ∧ t2 = L | v = 10)P(v = 10)+ + P(t1 = L ∧ t2 = L | v = 20)P(v = 20) = 1 · 1 4 + 1 4 · 1 2 = 3 8 which gives P(v = 20 | t1 = L ∧ t2 = L) = 1 3 . 329 Winner’s Curse (Cont.) : Proof of Equation (32) Finally, similarly as for (31), P(v = 10 ∧ t2 = L | t1 = L) = = P(v = 10 ∧ t2 = L | t1 = L ∧ t2 = L) · P(t2 = L | t1 = L) + P(v = 10 ∧ t1 = L | t1 = L ∧ t2 = H) · P(t2 = H | t1 = L) = P(v = 10 | t1 = L ∧ t2 = L) · P(t2 = L | t1 = L) = 2 3 · 3 4 = 1 2 Here P(v = 10 | t1 = L ∧ t2 = L) = 2 3 follows from P(v = 20 | t1 = L ∧ t2 = L) = 1 3 and P(v = 30 | t1 = L ∧ t2 = L) = 0. 330 Selfish Routing Congestion Games Potential Games 331 Selfish Routing – Motivation Many agents want to use shared resources Each of them is selfish and rational (i.e. maximizes his profit) Examples: Users of a computer network, drivers on roads How they are going to behave? How much is lost by letting agents behave selfishly on their own? 332 Example: Routing in Computer Networks Imagine a computer network, i.e., computers connected by links. There are several users, each user wants to route packets from a source computer zi to a target computer ti. For this, each user i needs to choose a path in the network from zi to ti. We assume that the more agents try to route their messages through the same link, the more the link gets congested and the more costly the transmission is. Now assume that the users are selfish and try to minimize their cost (typically transmission time). How would they behave? 333 Atomic Routing Games The network routing can be formalized using an atomic routing game that consists of a directed multi-graph G = (V, E, δ), Here V is a set of vertices, E is a set of edges, δ : E → V × V so that if δ(e) = (u, v) then e leads from u to v. The multigraph G models the network. n pairs of source-target vertices (z1, t1), . . . , (zn, tn) where z1, . . . , zn, t1, . . . , tn ∈ V, (Each pair (zi, ti) corresponds to a user who wants to route from zi to ti) for each e ∈ E a cost function ce : N → R such that ce(m) is the cost of routing through the link e if the amount of traffic through e is m. Each user i chooses a simple path from zi to ti and pays the sum of the costs of the links on the path. An atomic routing game is symmetric if z1 = · · · = zn and t1 = · · · = tn. 334 Atomic Routing Games Here we assume at most three users. Each edge is labeled by the cost if one, two, or all three users route through the edge, respectively. Here we consider a symmetric case with three users, each has the source z and target t. 335 Atomic Routing Games Here, e.g., the red user pays 3 + 2 = 5 : 3 for the first step from z (he shares the edge with the blue one) 2 for the second step to t (he is the only user of the edge) Atomic routing games are usually studied as a special case of so called (atomic) congestion games. 336 Congestion Games A congestion game is a tuple G = (N, R, (Si)i∈N , (cr )r∈R) where N = {1, . . . , n} is a set of players, R is a set of resources, each Si ⊆ 2R {∅} is a set of pure strategies for player i, each cr : N → R is a cost function for a resource r ∈ R. Notation: S = S1 × · · · × Sn and c = (c1, . . . , cn). Intuition: Each player allocates a set of resources by playing a pure strategy si ⊆ R. Then each player "pays" for every allocated resource r ∈ si based on cr and the number of other players who demand the same resource r : If players use the resource r, then each of them pays cr ( ) for this particular resource r. 337 Congestion Games: Payoffs and Nash Equilibria Let # : R × S → N be a function defined for r ∈ R and s = (s1, . . . , sn) ∈ S by #(r, s) = | {i ∈ N | r ∈ si} |. I.e., #(r, s) is the number of players using the resource r in the strategy profile s. We define the payoff for player i by ui(s) = − r∈si cr (#(r, s)) (33) Intuitively, the more congested a resource r ∈ si is, the more player i has to pay for it. Definition 94 Nash equilibria are defined as usual, a pure strategy profile (s1, . . . , sn) ∈ S is a Nash equilibrium if for every player i and every si ∈ Si we have ui(si, s−i) ≥ ui(si , s−i). 338 Atomic Routing Games and Congestion Games Given an atomic routing game we may model it as a congestion game (N, R, (Si)i∈N , (cr )r∈R) : Players N = {1, . . . , n} correspond to the pairs of source-target vertices (z1, t1), . . . , (zn, tn), resources are edges in the multigraph G, i.e, R = E, the set of pure strategies Si of player i consists of all simple paths (i.e., sets of edges) in the multigraph G from his source zi to his target ti, the cost function ce of each edge e ∈ E has to be determined according to the properties of the network. Often (but not always) a linear (affine) function ce(x) = aex + be is used (here x is the number of players using the edge e). Now each Nash equilibrium in G corresponds to a stable situation where no user wants to change his behavior. 339 Solving Congestion Games We consider the following questions: Are there pure strategy Nash equilibria? Can the agents "learn" to use the network? How difficult is to compute an equilibrium? 340 Learning: Myopic Best-Response Given a pure strategy profile s = (s1, . . . , sn), suppose that some player i has an alternative strategy si such that ui(si , s−i) > ui(si, s−i). Player i can switch (unilaterally) from si to si improving thus his payoff. Iterating such improvement steps, we obtain the following: Myopic best response procedure: Start with an arbitrary pure strategy profile s = (s1, . . . , sn). While there exists a player i for whom si is not a best response to s−i do si := a best-response by player i to s−i s := (si , s−i) return s By definition, if the myopic best response terminates, the resulting strategy profile s is a Nash equilibrium. It may not terminate in general (see the green board). Theorem 95 For every congestion game, the myopic best response terminates in a Nash equilibrium for an arbitrary starting pure strategy profile. 341 Potential Games We prove Theorem 95 by reduction to the following potential games. Definition 96 A strategic form game G = (N, (Si)i∈N , (ui)i∈N) is a potential game if there exists a function P : S1 × · · · × Sn → R such that for all i ∈ N, all s−i ∈ S−i and all si, si ∈ Si we have that ui(si, s−i) − ui(si , s−i) = P(si, s−i) − P(si , s−i) Theorem 97 For every finite potential game, the myopic best-response terminates in a Nash equilibrium for an arbitrary starting pure strategy profile. Proof. Note that every iteration of the myopic best-response procedure strictly increases ui(s) for some i, which in effect strictly increases P(s) by the same amount. As there are only finitely many strategy profiles, the procedure must terminate. The resulting profile is clearly a Nash equilibrium. 342 Congestion Games as Potential Games Theorem 98 Let G = (N, R, (Si)i∈N , (cr )r∈R ) be a congestion game and for each i ∈ N, let ui be the payoff of player i in G defined by the equation (33). Then (N, (Si)i∈N , (ui)i∈N) is a potential game. Recall that ui(s) = − r∈si cr (#(r, s)) where #(r, s) is the number of players using the resource r in the strategy profile s. Note that we obtain Theorem 95 as a corollary of Theorem 98 and Theorem 97. Proof of Theorem 98. Given s ∈ S = S1 × · · · × Sn, define P(s) = − r∈R #(r,s) j=1 cr (j) We show that P is a potential function, i.e., prove that for any two strategy profiles (si, s−i) and (si , s−i) we have P(si, s−i) − P(si , s−i) = ui(si, s−i) − ui(si , s−i) 343 Illustration of the potential Intuitively, the potential corresponds to the total cost paid by players when they choose their strategies one after the other. Consider two players: First, player 1 chooses a strategy s1 and pays r∈s1 cr (1) Then, player 2 chooses a strategy s2 and pays r∈s2 s1 cr (1) + r∈s2∩s1 cr (2) Summing we get r∈s1 cr (1) + r∈s2 s1 cr (1) + r∈s2∩s1 cr (2) = r∈s1 s2 cr (1) + r∈s2∩s1 cr (1) + r∈s2 s1 cr (1) + r∈s2∩s1 cr (2) = r∈s1 s2 cr (1) + r∈s2 s1 cr (1) + r∈s2∩s1 cr (1) + cr (2) = r∈R #(r,(s1,s2)) j=1 cr (j) 344 Illustration of Potential Let us compute the potential P. 345 Illustration of Potential First, add the red player ... 346 Illustration of Potential The red player pays 2 + 2 = 4. Second, add the green player ... 347 Illustration of Potential The green player pays 2 + 4 = 6. Third, add the blue player ... 348 Illustration of Potential The blue player pays 3 + 1 + 6 = 10. In total, they pay 4 + 6 + 10 = 20. We get the same number by using the expression for P : (2 + 3) + 2 + 1 + 2 + (4 + 6) = 20 The potential is thus P = −20. 349 Illustration of Potential ⇒ The blue player changes his strategy. What is the change in the potential? Recall that on the left hand side, the blue player paid 10 which gave the potential −20. Now he pays 3 + 3 = 6 on the right hand side. So the potential on the right hand side is −16. The difference between potentials is −20 − (−16) = −4. The difference between payoffs for the blue player is −10 − (−6) = −4. (the right hand side is cheaper and thus better for the blue player) 350 Illustration of Potential ⇒ The crucial observation is that we may consider players coming in an arbitrary order. In particular, to prove P(si, s−i) − P(si , s−i) = ui(si, s−i) − ui(si , s−i) we may assume that player i came last. 351 Proof of Theorem 98 (Cont.) Let (si, s−i) and (si , s−i) be two strategy profiles. Recall that we need to prove P(si, s−i) − P(si , s−i) = ui(si, s−i) − ui(si , s−i) By definition, P(si, s−i) − P(si , s−i) =   r∈R #(r,(si ,s−i )) j=1 cr (j)   −   r∈R #(r,(si ,s−i )) j=1 cr (j)   Note that #(r, (si, s−i)) =    #(r, s−i) + 1 if r ∈ si #(r, s−i) if r si We obtain ... 352 Proof of Theorem 98 (Cont.) −P(si, s−i) = r∈R #(r,(si ,s−i )) j=1 cr (j) = r∈R si #(r,(si ,s−i )) j=1 cr (j) + r∈si #(r,(si ,s−i )) j=1 cr (j) = r∈R si #(r,s−i ) j=1 cr (j) + r∈si #(r,s−i )+1 j=1 cr (j) = r∈R si #(r,s−i ) j=1 cr (j) + r∈si #(r,s−i ) j=1 cr (j) + r∈si cr (#(r, s−i) + 1) = r∈R #(r,s−i ) j=1 cr (j) + r∈si cr (#(r, s−i) + 1) Similarly, −P(si , s−i) = r∈R #(r,s−i ) j=1 cr (j) + r∈si cr (#(r, s−i) + 1) 353 Proof of Theorem 98 (Cont.) Now we can easily finish the proof of Theorem 98 P(si, s−i) − P(si , s−i) = =   r∈R #(r,s−i ) j=1 cr (j) + r∈si cr (#(r, s−i) + 1)   −   r∈R #(r,s−i ) j=1 cr (j) + r∈si cr (#(r, s−i) + 1)   = r∈si cr (#(r, s−i) + 1)) − r∈si cr (#(r, s−i) + 1) = r∈si cr (#(r, (si , s−i))) − r∈si cr (#(r, (si, s−i))) = ui(si, s−i) − ui(si , s−i) 354 Complexity of Congestion Games For concreteness, assume cr (j) = ar · j + br where ar , br are some non-negative constants. Myopic best response can be used to compute Nash equilibria but how many steps it makes? A naive bound would be the number of strategy profiles which is exponential in the number of players. Assume that the cost functions have values in N. Then every step of the myopic best response increases P by at least one, which means that the procedure starting in s stops after at most −P(s) = r∈R #(r,s) j=1 cr (j) steps. This gives a pseudo-polynomial time procedure. How many steps are really needed? On some instances any sequence of improvement steps to NE is of exponential length. In fact, the problem of computing NE in congestion games is PLS-complete. PLS class (Polynomial Local Search) models the difficulty of finding a locally optimal solution to an optimization problem (e.g. travelling salesman is PLS-complete). 355 Complexity of Atomic Routing Games Finding Nash equilibria in Atomic Routing Games is PLS-complete even if the cost functions are linear. There is a polynomial time algorithm for symmetric atomic routing games with non-decreasing cost functions based on a reduction to the minimum-cost flow problem. Here symmetric means that all players have the same source z and the same target t. Hence they also choose among the same simple paths. ⇒ For every edge in the routing graph G (left), there are n = 3 edges of capacity one in the minimum-cost flow network (right), each with one of the possible costs of the edge in G. 356 Non-Atomic Selfish Routing So far we have considered situations where each player (user, driver) has enough "weight" to explicitly influence payoffs of others (so a deviation of one player causes changes in payoffs of other players). In many applications, especially in the case of highway traffic problems, individual drivers have negligible influence on each other. What matters is a "distribution" of drivers on highways. To model such situations we use non-atomic routing games that can be seen as a limiting case of atomic selfish routing with the number of players going to ∞. 357 Non-Atomic Routing Games A Non-Atomic Routing Game consists of a directed multigraph G = (V, E, δ), n source-target pairs (z1, t1), . . . , (zn, tn), for each i = 1, . . . , n, the amount of traffic µi ∈ R≥0 from zi to ti, for each e ∈ E a cost function ce : R≥0 → R such that ce(x) is the cost of routing through the link e if the amount of traffic on e is x ∈ R≥0. For i = 1, . . . , n, let Pi be the set of all simple paths from zi to ti. Intuitively, there are uncountably many players, represented by [0, µi], going from zi to ti, each player chooses his path so that his total cost is minimized. Assume that Pi ∩ Pj = ∅ for i j. (This also implies that for all i j we have that either zi zj, or ti tj.) Denote by P the set of all "relevant" simple paths n i=1 Pi. Question: What is a "stable" distribution of the traffic among paths of P ? 358 Non-Atomic Routing Games A traffic distribution d is a function d : P → R≥0 such that p∈Pi d(p) = µi. Denote by D the set of all traffic distributions. Let us fix a traffic distribution d ∈ D. Given an edge e ∈ E, we denote by g(d, e) the amount of congestion on the edge e : g(d, e) = p∈P : e∈p d(p) Given p ∈ P, the payoff for players routing through p ∈ P is defined by u(d, p) = − e∈p ce(g(d, e)) Definition 99 A traffic distribution d ∈ D is a Nash equilibrium if for every i = 1, . . . , n and every path p ∈ Pi such that d(p) > 0 the following holds: u(d, p) ≥ u(d, p ) for all p ∈ Pi 359 Price of Anarchy Theorem 100 Every non-atomic routing game has a Nash equilibrium. We define a social cost of a traffic distribution d by C(d) = p∈P d(p) · (−u(d, p)) = p∈P d(p) · e∈p ce(g(d, e)) Theorem 101 All Nash equilibria in non-atomic routing games have the same social cost. A price of anarchy is defined by PoA = C(d∗) mind C(d) where d∗ is a (arbitrary) Nash equilibrium Intuitively, PoA is the proportion of additional social cost that is incurred because of agents’ self-interested behavior. 360 Price of Anarchy Theorem 102 (Roughgarden-Tardos’2000) For all non-atomic routing games with linear cost functions holds PoA ≤ 4 3 and this bound is tight (e.g. the Pigou’s example). The price of anarchy can be defined also for atomic routing games: PoAnon−atom := maxs∗ is NE n i=1(−ui(s∗ )) mins∈S n i=1(−ui(s)) (Intuitively, n i=1(−ui(s)) is the total amount paid by all players playing the strategy profile s.) Theorem 103 (Christodoulou-Koutsoupias’2005) For all atomic routing games with linear cost functions holds PoAnon−atom ≤ 5 2 (which is again tight, just like 4 3 for non-atomic routing.) 361 Braess’s Paradox For an example see the green board. Real-world occurences (Wikipedia): In Seoul, South Korea, a speeding-up in traffic around the city was seen when a motorway was removed as part of the Cheonggyecheon restoration project. In Stuttgart, Germany after investments into the road network in 1969, the traffic situation did not improve until a section of newly built road was closed for traffic again. In 1990 the closing of 42nd street in New York City reduced the amount of congestion in the area. In 2012, scientists at the Max Planck Institute for Dynamics and Self-Organization demonstrated through computational modeling the potential for this phenomenon to occur in power transmission networks where power generation is decentralized. In 2012, a team of researchers published in Physical Review Letters a paper showing that Braess paradox may occur in mesoscopic electron systems. They showed that adding a path for electrons in a nanoscopic network paradoxically reduced its conductance. 362 IA168 Algorithmic Game Theory Tomáš Brázdil 363 Evaluation Oral exam Homework (occasionally) Strictly dominated strategies for the exam: No preparation (skim-through) Learn only a strict subset THE strictly dominant strategy: Learn all definitions, algorithms, theorems and proofs. 364 What we did ... Types of games: strategic-form games extensive-form games (strict) incomplete information games & Bayesian games Types of strategies: pure mixed 365 What we did ... strategic-form games Solution concepts: strictly dominant strategy equilibrium iterated elimination of strictly dominated strategies rationalizability Nash equilibria We studied all these concepts in both pure and mixed strategies. We studied computational complexity of solving strategic-form games w.r.t. all above concepts. In particular, we considered classical algorithms for computing mixed Nash equilibria for two-player games: support enumeration Lemke-Howson For zero-sum two-player games a polynomial time algorithm based on von Neumann’s theorem was presented. 366 What we did ... extensive-form games We considered three levels of expressiveness: perfect-information extensive-form games imperfect-information extensive-form games perfect and imperfect-information extensive-form games with chance nodes In all cases we considered the following types of strategies: pure mixed behavioral Solution concepts: Nash equilibria subgame perfect equilibria (SPE) 367 What we did ... extensive-form games ... results For finite perfect-information extensive-form games: there always exists a pure strategy SPE (in pure as well as behavioral strategies) backward induction for computing SPE (can be used also for perfect-information games with chance nodes) equivalence of mixed and behavioral strategies For finite imperfect-information extensive-form games: there always exists a behavioral strategy Nash equilibrium backward induction on "perfect information" nodes mixed and behavioral strategies are not equivalent in general, they are equivalent for games with perfect recall 368 What we did ... repeated games Strategic-form games played repeatedly for either finitely many, or infinitely many rounds. Behavior of players may depend (arbitrarily) on the history of the play. They are a special case of imperfect-information extensive-form games. Solution concepts: For finitely repeated: average payoff (sum of payoffs) For infinitely repeated: discounted payoff long-run average payoff We have considered only pure strategies. 369 What we did ... repeated games ... results For finitely repeated: Unique SPE if the strategic-form game has a unique pure str. NE SPE obtained by iterating a NE from the strategic-form game other SPE (punishing equilibria) For infinitely repeated: discounted payoff: one-shot deviation property iff SPE grim trigger strategy profiles & simple Folk theorem for SPE (for bounded payoff functions) an approximate version of general Folk theorem for SPE (repeated finite strategic-form games only) (feasible payoffs) long-run average payoff: (almost) general Folk theorems for SPE and NE (repeated finite strategic-form games only) (feasible and individually rational payoffs) 370 What we did ... incomplete information games strict incomplete information games solution concepts: weak dominance, ex-post-Nash equilibrium Bayesian games solution concepts: weak dominance, Bayesian Nash equilibrium Only pure strategies. Auctions: Second-price auction: truth telling strategies are weakly dominant in both strict imperfect information as well as Bayesian model First-price auction: Bayesian games needed to obtain a solution, solved for uniform common prior Revenue equivalence. 371