Strategic-Form Games – Conclusion We have considered static games of complete information, i.e., "one-shot" games where the players know exactly what game they are playing. We modeled such games using strategic-form games. We have considered both pure strategy setting and mixed strategy setting. In both cases, we considered four solution concepts: � Strictly dominant strategies � Iterative elimination of strictly dominated strategies � Rationalizability (i.e., iterative elimination of strategies that are never best responses) � Nash equilibria 155 Strategic-Form Games – Conclusion In pure strategy setting: 1. Strictly dominant strategy equilibrium survives IESDS, rationalizability and is the unique Nash equilibrium (if it exists) 2. In finite games, rationalizable equilibria survive IESDS, IESDS preserves the set of Nash equilibria 3. In finite games, rationalizability preserves Nash equilibria In mixed setting: 1. In finite two player games, IESDS and rationalizability coincide. 2. Strictly dominant strategy equilibrium survives IESDS (rationalizability) and is the unique Nash equilibrium (if it exists) 3. In finite games, IESDS (rationalizability) preserves Nash equilibria The proofs for 2. and 3. in the mixed setting are similar to corresponding proofs in the pure setting. 156 Algorithms � Strictly dominant strategy equilibria coincide in pure and mixed settings, and can be computed in polynomial time. � IESDS and rationalizability can be implemented in polynomial time in the pure setting as well as in the mixed setting In the mixed setting, linear programming is needed to implement one step of IESDS (rationalizability). � Nash equilibria can be computed for two-player games � in polynomial time for zero-sum games (using von Neumann’s theorem and linear programming) � in exponential time using support enumeration � in PPAD using Lemke-Howson 157 Complexity of Nash Eq. – FNP (Roughly) Let R be a binary relation on words (over some alphabet) that is polynomial-time computable and polynomially balanced. I.e., membership to R is decidable in polynomial time, and (x, y) ∈ R implies |y| ≤ |x|k where k is independent of x, y. A search problem associated with R is this: Given an input x, return a y such that (x, y) ∈ R if such y exists, and return "NO" otherwise. Note that the problem of computing NE can be seen as a search problem R where (x, y) ∈ R means that x is a strategic-form game and y is a Nash equilibrium of polynomial size. (We already know from support enumeration that there is a NE of polynomial size.) The class of all search problems is called FNP. A class FP ⊆ FNP contains all search problem that can be solved in polynomial time. A search problem determined by R is polynomially reducible to a search problem R� iff there exist polynomially computable functions f, g such that � if (x, y) ∈ R for some y, then (f(x), y� ) ∈ R� for some y� � if (f(x), y) ∈ R� , then (x, g(y)) ∈ R � if (f(x), y) � R� for all y, then (x, y) � R for all y 158 Complexity of Nash Eq. – PPAD (Roughly) The class PPAD is defined by specifying one of its complete problems (w.r.t. the polynomial time reduction) known as End-Of-The-Line: � Input: Two Boolean circuits (with basis ∧, ∨, ¬) S and P, each with m input bits and m output bits, such that P(0m ) = 0m � S(0m ). � Problem: Find an input x ∈ {0, 1}m such that P(S(x)) � x or S(P(x)) � x � 0m . Intuition: End-Of-The-Line creates a directed graph HS,P with vertex set {0, 1}m and an edge from x to y whenever both y = S(x) ("successor") and x = P(y) ("predecessor"). All vertices of HS,P have indegree and outdegree at most one. There is at least one source (i.e., x satisfying P(x) = x, namely 0m ), so there is at least one sink (i.e., x satisfying S(x) = x). The goal is to find either a source or a sink different from 0m . Theorem 53 The problem of computing Nash equilibria is complete for PPAD. That is, Nash belongs to PPAD and End-Of-The-Line is polynomially reducible to Nash. 159 Loose Ends – Modes of Dominance Let σi, σ� i ∈ Σi. Then σ� i is strictly dominated by σi if ui(σi, σ−i) > ui(σ� i , σ−i) for all σ−i ∈ Σ−i. Let σi, σ� i ∈ Σi. Then σ� i is weakly dominated by σi if ui(σi, σ−i) ≥ ui(σ� i , σ−i) for all σ−i ∈ Σ−i and there is σ� −i ∈ Σ−i such that ui(σi, σ� −i ) > ui(σ� i , σ� −i ). Let σi, σ� i ∈ Σi. Then σ� i is very weakly dominated by σi if ui(σi, σ−i) ≥ ui(σ� i , σ−i) for all σ−i ∈ Σ−i. A strategy is (strictly, weakly, very weakly) dominant in mixed strategies if it (strictly, weakly, very weakly) dominates any other mixed strategy. Claim 4 Any mixed strategy profile σ ∈ Σ such that each σi is very weakly dominant in mixed strategies is a mixed Nash equilibrium. The same claim can be proved in pure strategy setting. 160 Dynamic Games of Complete Information Extensive-Form Games Definition Sub-Game Perfect Equilibria 161 Dynamic Games of Perfect Information (Motivation) Static games (modeled using strategic-form games) cannot capture games that unfold over time. In particular, as all players move simultaneously, there is no way how to model situations in which order of moves is important. Imagine e.g. chess where players take turns, in every round a player knows all turns of the opponent before making his own turn. There are many examples of dynamic games: markets that change over time, political negotiations, models of computer systems, etc. We model dynamic games using extensive-form games, a tree like model that allows to express sequential nature of games. We start with perfect information games, where each player always knows results of all previous moves. Then generalize to imperfect information, where players may have only partial knowledge of these results (e.g. most card games). 162 Perfect-Info. Extensive-Form Games (Example) 1 h0 2 h1 (3, 1) K (1, 3) U L 2 h2 (2, 1) K (0, 0) U R Here h0, h1, h2 are non-terminal nodes, leaves are terminal nodes. Each non-terminal node is owned by a player who chooses an action. E.g. h1 is owned by player 2 who chooses either K or U Every action results in a transition to a new node. Choosing L in h0 results in a move to h1 When a play reaches a terminal node, players collect payoffs. E.g. the left most terminal node gives 3 to player 1 and 1 to player 2. 163 Perfect-Information Extensive-Form Games A perfect-information extensive-form game is a tuple G = (N, A, H, Z, χ, ρ, π, h0, u) where � N = {1, . . . , n} is a set of n players, A is a (single) set of actions, � H is a set of non-terminal (choice) nodes, Z is a set of terminal nodes (assume Z ∩ H = ∅), denote H = H ∪ Z, � χ : H → � 2A � {∅} � is the action function, which assigns to each choice node a non-empty set of enabled actions, � ρ : H → N is the player function, which assigns to each non-terminal node a player i ∈ N who chooses an action there, we define Hi := {h ∈ H | ρ(h) = i}, � π : H × A → H is the successor function, which maps a non-terminal node and an action to a new node, such that � h0 is the only node that is not in the image of π (the root) � for all h1, h2 ∈ H and for all a1 ∈ χ(h1) and all a2 ∈ χ(h2), if π(h1, a1) = π(h2, a2), then h1 = h2 and a1 = a2, � u = (u1, . . . , un), where each ui : Z → R is a payoff function for player i in the terminal nodes of Z. 164 Some Notation A path from h ∈ H to h� ∈ H is a sequence h1a2h2a3h3 · · · hk−1ak hk where h1 = h, hk = h� and π(hj−1, aj) = hj for every 1 < j ≤ k. Note that, in particular, h is a path from h to h. Assumption: For every h ∈ H there is a unique path from h0 to h and there is no infinite path (i.e., a sequence h1a2h2a3h3 · · · such that π(hj−1, aj) = hj for every j > 1). Note that the assumption is satisfied when H is finite. Indeed, uniqueness follows immediately from the definition of π. Now let X be the set of all h� from which there is a path to h. If h0 ∈ X we are done. Otherwise, let h� be a node of X with the longest path to h. As h� � h0, there is h�� and a ∈ χ(h�� ) such that h� = π(h�� , a). But then there is a path from h�� to h that is longer than the path from h� , a contradiction. The above claim implies that every perfect-information extensive-form game can be seen as a game on a rooted tree (H, E, h0) where � H ∪ Z is a set of nodes, � E ⊆ H × H is a set of edges defined by (h, h� ) ∈ E iff h ∈ H and there is a ∈ χ(h) such that π(h, a) = h� , � h0 is the root. 165 Some More Notation h� is a child of h, and h is a parent of h� if there is a ∈ χ(h) such that h� = π(h, a). h� ∈ H is reachable from h ∈ H if there is a path from h to h�. If h� is reachable from h we say that h� is a descendant of h and h is an ancestor of h� (note that, by definition, h is both a descendant and an ancestor of itself). 166 Example: Trust Game 1 h0 (5, 5) z1 D h1 2 (0, 20) z2 K (7.5, 12.5) z3 S T � Two players, both start with 5$ � Player 1 either distrusts (D) player 2 and keeps the money (payoffs (5, 5)), or trusts (T) player 2 and passes 5$ to player 2 � If player 1 chooses to trust player 2, the money is tripled by the experimenter and sent to player 2. � Player 2 may either keep (K) the additional 15$ (resulting in (0, 20)), or share (S) it with player 1 (resulting in (7.5, 12.5)) 167 Example: Trust Game (Cont.) 1 h0 (5, 5) z1 D h1 2 (0, 20) z2 K (7.5, 12.5) z3 S T � N = {1, 2}, A = {D, T, K, S} � H = {h0, h1}, Z = {z1, z2, z3} � χ(h0) = {D, T}, χ(h1) = {K, S} � ρ(h0) = 1, ρ(h1) = 2 � π(h0, D) = z1, π(h0, T) = h1, π(h1, K) = z2, π(h1, S) = z3 � u1(z1) = 5, u1(z2) = 0, u1(z3) = 7.5, u2(z1) = 5, u2(z2) = 20, u2(z3) = 12.5 168 Stackelberg Competition Very similar to Cournot duopoly ... � Two identical firms, players 1 and 2, produce some good. Denote by q1 and q2 quantities produced by firms 1 and 2, resp. � The total quantity of products in the market is q1 + q2. � The price of each item is κ − q1 − q2 where κ > 0 is fixed. � Firms have a common per item production cost c. Except that ... � As opposed to Cournot duopoly, the firm 1 moves first, and chooses the quantity q1 ∈ [0, ∞). � Afterwards, the firm 2 chooses q2 ∈ [0, ∞) (knowing q1) and then the firms get their payoffs. 169 Stackelberg Competition – Extensive-Form Model An extensive-form game model: � N = {1, 2} � A = [0, ∞) � H = {h0, h q1 1 | q1 ∈ [0, ∞)} � Z = {zq1,q2 | q1, q2 ∈ [0, ∞) � χ(h0) = [0, ∞), χ(h q1 1 ) = [0, ∞) � ρ(h0) = 1, ρ(h q1 1 ) = 2 � π(h0, q1) = h q1 1 , π(h q1 1 , q2) = zq1,q2 � The payoffs are � u1(zq1,q2 ) = q1(κ − q1 − q2) − q1c � u2(zq1,q2 ) = q2(κ − q1 − q2) − q2c 170 Example: Chess (a bit simplified) There are infinitely many representations of chess, this one is different from the one presented at the lecture. � N = {1, 2} � Denoting Boards the set of all (appropriately encoded) board positions, we define H = B × {1, 2} where B = {w ∈ Boards+ | no board repeats ≥ 3 times in w} (Here Boards+ is the set of all non-empty sequences of boards) � Z consists of all nodes (wb, i) (here b ∈ Boards) where either b is checkmate for player i, or i does not have a move in b, or every move of i in b leads to a board with two occurrences in w � χ(wb, i) is the set of all legal moves of player i in b � ρ(wb, i) = i � π is defined by π((wb, i), a) = (wbb� , 2 − i + 1) where b� is obtained from b according to the move a � h0 = (b0, 1) where b0 is the initial board � uj(wb, i) ∈ {1, 0, −1}, here 1 means "win", 0 means "draw", and −1 means "loss" for player j 171 Pure Strategies Let G = (N, A, H, Z, χ, ρ, π, h0, u) be a perfect-information extensive-form game. Definition 54 A pure strategy of player i in G is a function si : Hi → A such that for every h ∈ Hi we have that si(h) ∈ χ(h). We denote by Si the set of all pure strategies of player i in G. Denote by S = S1 × · · · × Sn the set of all pure strategy profiles. Note that each pure strategy profile s ∈ S determines a unique path ws = h0a1h1 · · · hk−1ak hk from h0 to a terminal node hk by aj = sρ(hj−1)(hj−1) ∀0 < j ≤ k Denote by O(s) the terminal node reached by ws. Abusing notation a bit, we denote by ui(s) the value ui(O(s)) of the payoff for player i when the terminal node O(s) is reached using strategies of s. 172 Example: Trust Game 1 h0 (5, 5) z1 D h1 2 (0, 20) z2 K (7.5, 12.5) z3 S T A pure strategy profile (s1, s2) where s1(h0) = T and s2(h1) = K is usually written as TK (BFS & left to right traversal) determines the path h0T h1K z2 The resulting payoffs: u1(s1, s2) = 0 and u2(s1, s2) = 20. 173 Extensive-Form vs Strategic-Form The extensive-form game G determines the corresponding strategic-form game ¯G = (N, (Si)i∈N , (ui)i∈N) Here note that the set of players N and the sets of pure strategies Si are the same in G and in the corresponding game. The payoff functions ui in ¯G are understood as functions on the pure strategy profiles of S = S1 × · · · × Sn. With this definition, we may apply all solution concepts and algorithms developed for strategic-form games to the extensive form games. We often consider the extensive-form to be only a different way of representing the corresponding strategic-form game and do not distuinguish between them. There are some issues, namely whether all notions from strategic-form area make sense in the extensive-form. Also, naive application of algorithms may result in unnecessarily high complexity. For now, let us consider pure strategies only! 174 Example: Trust Game 1 h0 (5, 5) z1 D h1 2 (0, 20) z2 K (7.5, 12.5) z3 S T Is any strategy strictly (weakly, very weakly) dominant? Is any strategy never best response? Is there a Nash equilibrium in pure strategies ? 175 Example 1 h0 2 h1 (3, 1) K (1, 3) U L 2 h2 (2, 1) K� (0, 0) U� R Find all pure strategies of both players. Is any strategy (strictly, weakly, very weakly) dominant? Is any strategy (strictly, weakly, very weakly) dominated? Is any strategy never best response? Are there Nash equilibria in pure strategies ? 176 Example 1 h0 2 h1 (3, 1) K (1, 3) U L 2 h2 (2, 1) K� (0, 0) U� R KK� KU� UK� UU� L 3, 1 3, 1 1, 3 1, 3 R 2, 1 0, 0 2, 1 0, 0 Find all pure strategies of both players. Is any strategy (strictly, weakly, very weakly) dominant? Is any strategy (strictly, weakly, very weakly) dominated? Is any strategy never best response? Are there Nash equilibria in pure strategies ? 177 Criticism of Nash Equilibria 1 h0 2 h1 (3, 1) K (1, 3) U L 2 h2 (2, 1) K� (0, 0) U� R KK� KU� UK� UU� L 3, 1 3, 1 1, 3 1, 3 R 2, 1 0, 0 2, 1 0, 0 Two Nash equilibria in pure strategies: (L, UU� ) and (R, UK� ) Examine (L, UU� ): � Player 2 threats to play U� in h2, � as a result, player 1 plays L, � player 2 reacts to L by playing the best response, i.e., U. However, the threat is not credible, once a play reaches h2, a rational player 2 chooses K� . 178 Criticism of Nash Equilibria 1 h0 2 h1 (3, 1) K (1, 3) U L 2 h2 (2, 1) K� (0, 0) U� R KK� KU� UK� UU� L 3, 1 3, 1 1, 3 1, 3 R 2, 1 0, 0 2, 1 0, 0 Two Nash equilibria in pure strategies: (L, UU� ) and (R, UK� ) Examine (R, UK� ): This equilibrium is sensible in the following sense: � Player 2 plays the best response in both h1 and h2 � Player 1 plays the "best response" in h0 assuming that player 2 will play his best responses in the future. This equilibrium is called subgame perfect. 179 Subgame Perfect Equilibria Given h ∈ H, we denote by Hh the set of all nodes reachable from h. Definition 55 (Subgame) A subgame Gh of G rooted in h ∈ H is the restriction of G to nodes reachable from h in the game tree. More precisely, Gh = (N, A, Hh , Zh , χh , ρh , πh , h, uh ) where Hh = H ∩ Hh , Zh = Z ∩ Hh , χh and ρh are restrictions of χ and ρ to Hh , resp., (Given a function f : A → B and C ⊆ A, a restriction of f to C is a function g : C → B such that g(x) = f(x) for all x ∈ C.) � πh is defined for h� ∈ Hh and a ∈ χh (h� ) by πh (h� , a) = π(h� , a) � each uh i is a restriction of ui to Zh Definition 56 A subgame perfect equilibrium (SPE) in pure strategies is a pure strategy profile s ∈ S such that for any subgame Gh of G, the restriction of s to Hh is a Nash equilibrium in pure strategies in Gh . A restriction of s = (s1, . . . , sn) ∈ S to Hh is a strategy profile sh = (sh 1 , . . . , sh n ) where sh i (h� ) = si(h� ) for all i ∈ N and all h� ∈ Hi ∩ Hh . 180 Stackelberg Competition – SPE � N = {1, 2}, A = [0, ∞) � H = {h0, hq1 1 | q1 ∈ [0, ∞)}, Z = {zq1,q2 | q1, q2 ∈ [0, ∞) � χ(h0) = [0, ∞), χ(hq1 1 ) = [0, ∞), ρ(h0) = 1, ρ(hq1 1 ) = 2 � π(h0, q1) = hq1 1 , π(hq1 1 , q2) = zq1,q2 � The payoffs are u1(zq1,q2 ) = q1(κ − c − q1 − q2), u2(zq1,q2 ) = q2(κ − c − q1 − q2) Denote θ = κ − c Player 1 chooses q1, we know that the best response of player 2 is q2 = (θ − q1)/2 where θ = κ − c. Then u1(zq1,q2 ) = q1(θ − q1 − θ/2 − q1/2) = (θ/2)q1 − q2 1 /2 which is maximized by q1 = θ/2, giving q2 = θ/4. Then u1(zq1,q2 ) = θ2 /8 and u2(zq1,q2 ) = θ2 /16. Note that firm 1 has an advantage as a leader. 181 Existence of SPE From this moment on we consider only finite games! Theorem 57 Every finite perfect-information extensive-form game has a SPE in pure strategies. Proof: By induction on the number of nodes. Base case: If |H| = 1, the only node is terminal, and the trivial pure strategy profile is SPE. Induction step: Consider a game with more than one node. Let K = {h1, . . . , hk } be the set of all children of the root h0. By induction, for every h� there is a SPE sh� in Gh� . For every i ∈ N, define a strategy si of player i in G as follows: � for i = ρ(h0) we have si(h0) ∈ argmaxh�∈K uh� i (sh� ) � for all i ∈ N and h ∈ H we have si(h) = sh� i (h) where h ∈ Hh� ∩ Hi We claim that s = (s1, . . . , sn) is a SPE in pure strategies. By definition, s is NE in all subgames except (possibly) the G itself. � 182 Existence of SPE (Cont.) Let h� = sρ(h0)(h0). Consider a possible deviation of player i. Let ¯s be another pure strategy profile in G obtained from s = (s1, . . . , sn) by changing si. First, assume that i � ρ(h0). Then ui(s) = uh� i (sh� ) ≥ uh� i (¯sh� ) = ui(¯s) Here the first equality follows from h� = sρ(h0)(h0) and that s behaves similarly as sh� in Gh� , the inequality follows from the fact that sh� is a NE in Gh� , and the second equality follows from h� = sρ(h0)(h0) = ¯sρ(h0)(h0). Second, assume that i = ρ(h0). Let hr = ¯si(h0) = ¯sρ(h0)(h0). Then uh� i (sh� ) ≥ uhr i (shr ) because h� maximizes the payoff of player i = ρ(h0) in the children of h0. But then ui(s) = uh� i (sh� ) ≥ uhr i (shr ) ≥ uhr i (¯shr ) = ui(¯s) 183 Backward Induction The proof of Theorem 57 gives an efficient procedure for computing SPE for finite perfect-information extensive-form games. Backward Induction: We inductively "attach" to every node h a SPE sh in Gh , together with a vector of expected payoffs u(h) = (u1(h), . . . , un(h)). � Initially: Attach to each terminal node z ∈ Z the empty profile sz = (∅, . . . , ∅) and the payoff vector u(z) = (u1(z), . . . , un(z)). � While(there is an unattached node h with all children attached): 1. Let K be the set of all children of h 2. Let hmax ∈ argmax h�∈K uρ(h)(h� ) 3. Attach to h a SPE sh where � sh ρ(h) (h) = hmax � for all i ∈ N and all h� ∈ Hi define sh i (h� ) = s ¯h i (h� ) where h� ∈ H ¯h ∩ Hi (in G ¯h , each sh i behaves as s ¯h i i.e. � sh �¯h = s ¯h ) 4. Attach to h the expected payoffs ui(h) = ui(hmax) for i ∈ N. 184 Chess Recall that in the model of chess, the payoffs were from {1, 0, −1} and u1 = −u2 (i.e. it is zero-sum). By Theorem 57, there is a SPE in pure strategies (s∗ 1 , s∗ 2 ). However, then one of the following holds: 1. White has a winning strategy If u1(s∗ 1 , s∗ 2 ) = 1 and thus u2(s∗ 1 , s∗ 2 ) = −1 2. Black has a winning strategy If u1(s∗ 1 , s∗ 2 ) = −1 and thus u2(s∗ 1 , s∗ 2 ) = 1 3. Both players have strategies to force a draw If u1(s∗ 1 , s∗ 2 ) = 0 and thus u2(s∗ 1 , s∗ 2 ) = 0 Question: Which one is the right answer? Answer: Nobody knows yet ... the tree is too big! Even with ∼ 200 depth & ∼ 5 moves per node: 5200 nodes! 185