IA168 Algorithmic Game Theory
Tomáš Brázdil
1
Organization of This Course
Sources:
Lectures (slides, notes)
based on several sources
slides are prepared for lectures, some stuff on greenboard
(⇒ attend the lectures)
Books:
Nisan/Roughgarden/Tardos/Vazirani, Algorithmic Game
Theory, Cambridge University, 2007.
Available online for free:
http://www.cambridge.org/journals/nisan/downloads/Nisan_Non-printable.pdf
Tadelis, Game Theory: An Introduction, Princeton
University Press, 2013
(I use various resources, so please, attend the lectures)
2
Evaluation
Oral exam
Homework
3
What is Algorithmic Game Theory?
First, what is the game theory?
According to the Oxford dictionary it is "the branch of mathematics
concerned with the analysis of strategies for dealing with competitive
situations where the outcome of a participant’s choice of action
depends critically on the actions of other participants"
According to Myerson it is "the study of
mathematical models of conﬂict and cooperation
between intelligent rational decision-makers"
What does the "algorithmic" mean?
It means that we are "concerned with the computational
questions that arise in game theory, and that enlighten game
theory. In particular, questions about ﬁnding efﬁcient algorithms
to ‘solve’ games.”
Let’s have a look at some examples ....
4
Prisoner’s Dilemma
Two suspects of a serious crime are
arrested and imprisoned.
Police has enough evidence of only
petty theft, and to nail the suspects for
the serious crime they need testimony
from at least one of them.
The suspects are interrogated
separately without any possibility of
communication.
Each of the suspects is offered a deal:
If he confesses (C) to the crime, he is
free to go. The alternative is not to
confess, that is remain silent (S).
Sentence depends on the behavior of both suspects.
The problem: What would the suspects do?
5
Prisoner’s Dilemma – Solution(?)
C S
C −5, −5 0, −20
S −20, 0 −1, −1
Rational "row" suspect (or his adviser) may reason as follows:
If my colleague chooses C, then playing C gives me −5 and
playing S gives −20.
If my colleague chooses S, then playing C gives me 0 and
playing S gives −1.
In both cases C is clearly better (it strictly dominates the other
strategy). If the other suspect’s reasoning is the same, both choose C
and get 5 years sentence.
Where is the dilemma? There is a solution (S, S) which is better for
both players but needs some “central” authority to control the players.
Are there always “dominant” strategies?
6
Nash equilibria – Battle of Sexes
A couple agreed to meet this evening, but cannot
recall if they will be attending the opera or a football
match.
The husband would like to go to the football game.
The wife would like to go to the opera. Both would
prefer to go to the same place rather than different
ones.
If they cannot communicate, where should they go?
7
Nash equilibria – Battle of Sexes
Battle of Sexes can be modeled as a game of two players (Wife,
Husband) with the following payoffs:
O F
O 2, 1 0, 0
F 0, 0 1, 2
Apparently, no strategy of any player is dominant. A “solution”?
Note that whenever both players play O, then neither of them wants
to unilaterally deviate from his strategy!
(O, O) is an example of a Nash equilibrium (as is (F, F))
8
Mixed Equilibria – Rock-Paper-Scissors
R P S
R 0, 0 −1, 1 1, −1
P 1, −1 0, 0 −1, 1
S −1, 1 1, −1 0, 0
This is an example of zero-sum games: whatever one of the
players wins, the other one looses.
What is an optimal behavior here? Is there a Nash equilibrium?
Use mixed strategies: Each player plays each pure strategy with
probability 1/3. The expected payoff of each player is 0 (even if
one of the players changes his strategy, he still gets 0!).
How to algorithmically solve games in mixed strategies? (we
shall use probability theory and linear programming)
9
Philosophical Issues in Games
10
Dynamic Games
So far we have seen games in strategic form that are unable to
capture games that unfold over time (such as chess).
For such purpose we need to use extensive form games:
P1
P2
(1, 2)
C
(1, −1)
D
(0, 2)
E
A
P2
(2, 2)
F
(1, 3)
G
B
How to "solve" such games?
What is their relationship to the strategic form games?
11
Chance and Imperfect Information
Some decisions in the game tree may be by chance and controlled by
neither player (e.g. Poker, Backgammon, etc.)
Sometimes a player may not be able to distinguish between several
“positions” because he does not know all the information in them
(Think a card game with opponent’s cards hidden).
F G
D 1
2
F G
E1
2
A
H I J
B
P1
P1
Nature
P2
(a, b) (c, d) (e, f) (g, h) (i, j) (k, ) (m, n)
Again, how to solve such games? 12
Games of Incomplete Information
In all previous games the players knew all details of the game
they played, and this fact was a “common knowledge”. This is
not always the case.
Example: Sealed Bid Auction
Two bidders are trying to purchase the same item.
The bidders simultaneously submit bids b1 and b2 and the item
is sold to the highest bidder at his bid price (ﬁrst price auction)
The payoff of the player 1 (and similarly for player 2) is
calculated by
u1(b1, b2) =



v1 − b1 b1 > b2
1
2 (v1 − b1) b1 = b2
0 b1 < b2
Here v1 is the private value that player 1 assigns to the item and
so the player 2 does not know u1.
How to deal with such a game? Assume the “worst” private value?
What if we have a partial knowledge about the private values? 13
Inefﬁciency of Equilibria
In Prisoner’s Dilemma, the selﬁsh behavior
of suspects (the Nash equilibrium) results in
somewhat worse than ideal situation.
C S
C −5, −5 0, −20
S −20, 0 −1, −1
Deﬁning a welfare function W which to every pair of strategies
assigns the sum of payoffs, we get W(C, C) = −10 but
W(S, S) = −2.
The ratio
W(C,C)
W(S,S) = 5 measures the inefﬁciency of "selﬁsh-behavior"
(C, C) w.r.t. the optimal “centralized” solution.
Price of Anarchy is the maximum ratio between values of equilibria
and the value of an optimal solution.
14
Inefﬁciency of Equilibria – Selﬁsh Routing
Consider a transportation system where many
agents are trying to get from some initial location to
a destination. Consider the welfare to be the
average time for an agent to reach the destination.
There are two versions:
“Centralized”: A central authority tells each agent where to go.
“Decentralized”: Each agent selﬁshly minimizes his travel time.
Price of Anarchy measure the ratio between average travel time in
these two cases.
Problem: Bound the price of anarchy over all routing games?
15
Games in Computer Science
Game theory is a core foundation of mathematical economics. But
what does it have to do with CS?
Games in AI: modeling of “rational” agents and their interactions.
Games in Algorithms: several game theoretic problems have
a very interesting algorithmic status and are solved by
interesting algorithms
Games in modeling and analysis of reactive systems: program
inputs viewed “adversarially”, bisimulation games, etc.
Games in computational complexity: Many complexity classes
are deﬁnable in terms of games: PSPACE, polynomial hierarchy,
etc.
Games in Logic: modal and temporal logics,
Ehrenfeucht-Fraisse games, etc.
16
Games in Computer Science
Games, the Internet and E-commerce: An extremely active
research area at the intersection of CS and Economics
Basic idea: “The internet is a HUGE experiment in interaction
between agents (both human and automated)”
How do we set up the rules of this game to harness “socially
optimal” results?
17
Summary and Brief Overview
This is a theoretical course aimed at some fundamental results of
game theory, often related to computer science
We start with strategic form games (such as the Prisoner’s
dilemma), investigate several solution concepts (dominance,
equilibria) and related algorithms (in particular, Lemke-Howson
algorithm for computing Nash Eq.)
Then we consider repeated games which allow players to learn
from history and/or to react to deviations of the other players.
Subsequently, we move on to incomplete information games and
auctions
Finally, we consider (in)efﬁciency of equilibria (such as the Price
of Anarchy) and its properties on important classes of routing
and network formation games.
Remaining time will be devoted to selected topics from extensive
form games, games on graphs etc.
18
Static Games of Complete Information
Strategic-Form Games
Solution concepts
19
Static Games of Complete Information – Intuition
Proceed in two steps:
1. Each player simultaneously and independently chooses
a strategy. This means that players play without observing
strategies chosen by other players.
2. Conditional on the players’ strategies, payoffs are distributed to
all players.
Complete information means that the following is common knowledge
among players:
all possible strategies of all players,
what payoff is assigned to each combination of strategies.
Deﬁnition 1
A fact E is a common knowledge among players {1, . . . , n} if for every
sequence i1, . . . , ik ∈ {1, . . . , n} we have that i1 knows that i2 knows
that ... ik−1 knows that ik knows E.
The goal of each player is to maximize his payoff (and this fact is
common knowledge).
20
Strategic-Form Games
To formally represent static games of complete information we deﬁne
strategic-form games.
Deﬁnition 2
A game in strategic-form (or normal-form) is an ordered triple
G = (N, (Si)i∈N , (ui)i∈N), in which:
N = {1, 2, . . . , n} is a ﬁnite set of players.
Si is a set of (pure) strategies of player i, for every i ∈ N.
A strategy proﬁle is a vector of strategies of all players
(s1, . . . , sn) ∈ S1 × · · · × Sn.
We denote the set of all strategy proﬁles by S = S1 × · · · × Sn.
ui : S → R is a function associating each strategy proﬁle
s = (s1, . . . , sn) ∈ S with the payoff ui(s) to player i, for every
player i ∈ N.
Deﬁnition 3
A zero-sum game G is one in which for all s = (s1, . . . , sn) ∈ S we
have u1(s) + u2(s) + · · · + un(s) = 0. 21
Example: Prisoner’s Dilemma
N = {1, 2}
S1 = S2 = {S, C}
u1, u2 are deﬁned as follows:
u1(C, C) = −5, u1(C, S) = 0, u1(S, C) = −20,
u1(S, S) = −1
u2(C, C) = −5, u2(C, S) = −20, u2(S, C) = 0,
u2(S, S) = −1
(Is it zero sum?)
We usually write payoffs in the following form:
C S
C −5, −5 0, −20
S −20, 0 −1, −1
or as two matrices:
C S
C −5 0
S −20 −1
C S
C −5 −20
S 0 −1
22
Example: Cournot Duopoly
Two identical ﬁrms, players 1 and 2, produce some good.
Denote by q1 and q2 quantities produced by ﬁrms 1 and 2, resp.
The total quantity of products in the market is q1 + q2.
The price of each item is κ − q1 − q2 (here κ is a positive
constant)
Firms 1 and 2 have per item production costs c1 and c2, resp.
Question: How these ﬁrms are going to behave?
We may model the situation using a strategic-form game.
Strategic-form game model (N, (Si)i∈N , (ui)i∈N)
N = {1, 2}
Si = [0, ∞)
u1(q1, q2) = q1(κ − q1 − q2) − q1c1
u2(q1, q2) = q2(κ − q1 − q2) − q2c2
23
Solution Concepts
A solution concept is a method of analyzing games with the objective
of restricting the set of all possible outcomes to those that are more
reasonable than others.
We will use term equilibrium for any one of the strategy proﬁles that
emerges as one of the solution concepts’ predictions.
(I follow the approach of Steven Tadelis here, it is not completely standard)
Example 4
Nash equilibrium is a solution concept. That is, we “solve” games by
ﬁnding Nash equilibria and declare them to be reasonable outcomes.
24
Assumptions
Throughout the lecture we assume that:
1. Players are rational: a rational player is one who chooses his
strategy to maximize his payoff.
2. Players are intelligent: An intelligent player knows everything
about the game (actions and payoffs) and can make any
inferences about the situation that we can make
3. Common knowledge: The fact that players are rational and
intelligent is a common knowledge among them.
4. Self-enforcement: Any prediction (or equilibrium) of a solution
concept must be self-enforcing.
Here 4. implies non-cooperative game theory: Each player is in
control of his actions, and he will stick to an action only if he ﬁnds it to
be in his best interest.
25
Evaluating Solution Concepts
In order to evaluate our theory as a methodological tool we use the
following criteria:
1. Existence (i.e. How often does it apply?): Solution concept
should apply to a wide variety of games.
E.g. We prove that mixed Nash equilibria exist in all two player ﬁnite
strategic-form games.
2. Uniqueness (How much does it restrict behavior?): We demand
our solution concept to restrict the behavior as much as possible.
E.g. So called strictly dominant strategy equilibria are always unique as
opposed to Nash eq.
The basic notion for evaluating "social outcome" is the following
Deﬁnition 5
A strategy proﬁle s ∈ S Pareto dominates a strategy proﬁle s ∈ S if
ui(s) ≥ ui(s ) for all i ∈ N, and ui(s) > ui(s ) for at least one i ∈ N.
A strategy proﬁle s ∈ S is Pareto optimal if it is not Pareto dominated
by any other strategy proﬁle.
We will see more measures of social outcome later.
26
Solution Concepts – Pure Strategies
We will consider the following solution concepts:
strict dominant strategy equilibrium
iterated elimination of strictly dominated strategies (IESDS)
rationalizability
Nash equilibria
For now, let us concentrate on
pure strategies only!
I.e., no mixed strategies are allowed. We will generalize to
mixed setting later.
27
Notation
Let N = {1, . . . , n} be a ﬁnite set and for each i ∈ N let Xi be
a set. Let X := i∈N Xi = {(x1, . . . , xn) | xj ∈ Xj, j ∈ N}.
For i ∈ N we deﬁne X−i := j i Xj, i.e.,
X−i = {(x1, . . . , xi−1, xi+1, . . . , xn) | xj ∈ Xj, ∀j i}
An element of X−i will be denoted by
x−i = (x1, . . . , xi−1, xi+1, . . . , xn)
We slightly abuse notation and write (xi, x−i) to denote
(x1, . . . , xi, . . . , xn) ∈ X.
28
Strict Dominance in Pure Strategies
Deﬁnition 6
Let si, si
∈ Si be strategies of player i. Then si
is strictly
dominated by si (write si si
) if for any possible combination of
the other players’ strategies, s−i ∈ S−i, we have
ui(si, s−i) > ui(si , s−i) for all s−i ∈ S−i
Claim 1
An intelligent and rational player will never play a strictly
dominated strategy.
Clearly, intelligence implies that the player should recognize dominated
strategies, rationality implies that the player will avoid playing them.
29
Strictly Dominant Strategy Equilibrium in Pure Str.
Deﬁnition 7
si ∈ Si is strictly dominant if every other pure strategy of player i is
strictly dominated by si.
Observe that every player has at most one strictly dominant strategy,
and that strictly dominant strategies do not have to exist.
Claim 2
Any rational player will play the strictly dominant strategy (if it exists).
Deﬁnition 8
A strategy proﬁle s ∈ S is a strictly dominant strategy equilibrium if
si ∈ Si is strictly dominant for all i ∈ N.
Corollary 9
If the strictly dominant strategy equilibrium exists, it is unique and
rational players will play it.
Is the strictly dominant strategy equilibrium always Pareto optimal?
30
Examples
In the Prisoner’s dilemma:
C S
C −5, −5 0, −20
S −20, 0 −1, −1
(C, C) is the strictly dominant strategy equilibrium (the only
proﬁle that is not Pareto optimal!).
In the Battle of Sexes:
O F
O 2, 1 0, 0
F 0, 0 1, 2
no strictly dominant strategies exist.
31
Indiana Jones and the Last Crusade
(Taken from Dixit & Nalebuff’s "The Art of Strategy" and a lecture of Robert
Marks)
Indiana Jones, his father, and the Nazis have all converged at the site
of the Holy Grail. The two Joneses refuse to help the Nazis reach the
last step. So the Nazis shoot Indiana’s dad. Only the healing power of
the Holy Grail can save the senior Dr. Jones from his mortal wound.
Suitably motivated, Indiana leads the way to the Holy Grail. But there
is one ﬁnal challenge. He must choose between literally scores of
chalices, only one of which is the cup of Christ. While the right cup
brings eternal life, the wrong choice is fatal. The Nazi leader
impatiently chooses a beautiful gold chalice, drinks the holy water,
and dies from the sudden death that follows from the wrong choice.
Indiana picks a wooden chalice, the cup of a carpenter. Exclaiming
"There’s only one way to ﬁnd out" he dips the chalice into the font and
drinks what he hopes is the cup of life. Upon discovering that he has
chosen wisely, Indiana brings the cup to his father and the water
heals the mortal wound. 32
Indiana Jones and the Last Crusade (cont.)
Indy Goofed
Although this scene adds excitement, it is somewhat
embarrassing that such a distinguished professor as Dr. Indiana
Jones would overlook his dominant strategy.
He should have given the water to his father without testing it
ﬁrst.
If Indiana has chosen the right cup, his father is still saved.
If Indiana has chosen the wrong cup, then his father dies
but Indiana is spared.
Testing the cup before giving it to his father doesn’t help, since if
Indiana has made the wrong choice, there is no second chance
– Indiana dies from the water and his father dies from the wound.
33
Iterated Strict Dominance in Pure Strategies
We know that no rational player ever plays strictly dominated
strategies.
As each player knows that each player is rational, each player knows
that his opponents will not play strictly dominated strategies and thus
all opponents know that effectively they are facing a "smaller" game.
As rationality is a common knowledge, everyone knows that everyone
knows that the game is effectively smaller.
Thus everyone knows, that nobody will play strictly dominated
strategies in the smaller game (and such strategies may indeed
exist).
Because it is a common knowledge that all players will perform this
kind of reasoning again, the process can continue until no more
strictly dominated strategies can be eliminated.
34
IESDS
The previous reasoning yields the Iterated Elimination of Strictly
Dominated Strategies (IESDS):
Deﬁne a sequence D0
i
, D1
i
, D2
i
, . . . of strategy sets of player i.
(Denote by Gk
DS
the game obtained from G by restricting to Dk
i
, i ∈ N.)
1. Initialize k = 0 and D0
i
= Si for each i ∈ N.
2. For all players i ∈ N: Let Dk+1
i
be the set of all pure strategies of
Dk
i
that are not strictly dominated in Gk
DS
.
3. Let k := k + 1 and go to 2.
We say that si ∈ Si survives IESDS if si ∈ Dk
i
for all k = 0, 1, 2, . . .
Deﬁnition 10
A strategy proﬁle s = (s1, . . . , sn) ∈ S is an IESDS equilibrium if each
si survives IESDS.
A game is IESDS solvable if it has a unique IESDS equilibrium.
Remark: If all Si are ﬁnite, then in 2. we may remove only some of the strictly
dominated strategies (not necessarily all). The result is not affected by the
order of elimination since strictly dominated strategies remain strictly
dominated even after removing some other strictly dominated strategies. 35
IESDS Examples
In the Prisoner’s dilemma:
C S
C −5, −5 0, −20
S −20, 0 −1, −1
(C, C) is the only one surviving the ﬁrst round of IESDS.
In the Battle of Sexes:
O F
O 2, 1 0, 0
F 0, 0 1, 2
all strategies survive all rounds (i.e. IESDS ≡ anything may
happen, sorry)
36
A Bit More Interesting Example
L C R
L 4, 3 5, 1 6, 2
C 2, 1 8, 4 3, 6
R 3, 0 9, 6 2, 8
IESDS on greenboard!
37
Political Science Example: Median Voter Theorem
Hotelling (1929) and Downs (1957)
N = {1, 2}
Si = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} (political and ideological spectrum)
10 voters belong to each position
(Here 10 means ten percent in the real-world)
Voters vote for the closest candidate. If there is a tie, then 1
2 got
to each candidate
Payoff: The number of voters for the candidate, each candidate
(selﬁshly) strives to maximize this number
38
Political Science Example: Median Voter Theorem
1 and 10 are the (only) strictly dominated strategies ⇒
D1
1
= D1
2
= {2, . . . , 9}
in G1
DS
, 2 and 9 are the (only) strictly dominated strategies ⇒
D2
1
= D2
2
= {3, . . . , 8}
. . .
only 5, 6 survive IESDS 39
Belief & Best Response
IESDS eliminated apparently unreasonable behavior (leaving
"reasonable" behavior implicitly untouched).
What if we rather want to actively preserve reasonable behavior?
What is reasonable? .... what we believe is reasonable :-).
Intuition:
Imagine that your colleague did something stupid
What would you ask him? Usually something like "What were
you thinking?"
The colleague may respond with a reasonable description of his
belief in which his action was (one of) the best he could do
(You may of course question reasonableness of the belief)
Let us formalize this type of reasoning ....
40
Belief & Best Response
Deﬁnition 11
A belief of player i is a pure strategy proﬁle s−i ∈ S−i of his opponents.
Deﬁnition 12
A strategy si ∈ Si of player i is a best response to a belief s−i ∈ S−i if
ui(si, s−i) ≥ ui(si , s−i) for all si ∈ Si
Claim 3
A rational player who believes that his opponents will play s−i ∈ S−i
always chooses a best response to s−i ∈ S−i.
Deﬁnition 13
A strategy si ∈ Si is never best response if it is not a best response to
any belief s−i ∈ S−i.
A rational player never plays any strategy that is never best response.
41
Best Response vs Strict Dominance
Proposition 1
If si is strictly dominated for player i, then it is never best
response.
The opposite does not have to be true in pure strategies:
X Y
A 1, 1 1, 1
B 2, 1 0, 1
C 0, 1 2, 1
Here A is never best response but is strictly dominated neither
by B, nor by C.
42
Elimination of Stupid Strategies = Rationalizability
Using similar iterated reasoning as for IESDS, strategies that are
never best response can be iteratively eliminated.
Deﬁne a sequence R0
i
, R1
i
, R2
i
, . . . of strategy sets of player i.
(Denote by Gk
Rat
the game obtained from G by restricting to Rk
i
, i ∈ N.)
1. Initialize k = 0 and R0
i
= Si for each i ∈ N.
2. For all players i ∈ N: Let Rk+1
i
be the set of all strategies of Rk
i
that are best responses to some beliefs in Gk
Rat
.
3. Let k := k + 1 and go to 2.
We say that si ∈ Si is rationalizable if si ∈ Rk
i
for all k = 0, 1, 2, . . .
Deﬁnition 14
A strategy proﬁle s = (s1, . . . , sn) ∈ S is a rationalizable equilibrium if
each si is rationalizable.
We say that a game is solvable by rationalizability if it has a unique
rationalizable equilibrium.
(Warning: For some reasons, rationalizable strategies are almost always
deﬁned using mixed strategies!)
43
Rationalizability Examples
In the Prisoner’s dilemma:
C S
C −5, −5 0, −20
S −20, 0 −1, −1
(C, C) is the only rationalizable equilibrium.
In the Battle of Sexes:
O F
O 2, 1 0, 0
F 0, 0 1, 2
all strategies are rationalizable.
44
Cournot Duopoly
G = (N, (Si)i∈N , (ui)i∈N)
N = {1, 2}
Si = [0, ∞)
u1(q1, q2) = q1(κ − q1 − q2) − q1c1 = (κ − c1)q1 − q2
1
− q1q2
u2(q1, q2) = q2(κ − q2 − q1) − q2c2 = (κ − c2)q2 − q2
2
− q2q1
Assume for simplicity that c1 = c2 = c and denote θ = κ − c.
What is a best response of player 1 to a given q2 ?
Solve δu1
δq1
= θ − 2q1 − q2 = 0, which gives that q1 = (θ − q2)/2 is
the only best response of player 1 to q2.
Similarly, q2 = (θ − q1)/2 is the only best response of player 2 to q1.
Since q2 ≥ 0, we obtain that q1 is never best response iff q1 > θ/2.
Similarly q2 is never best response iff q2 > θ/2.
Thus R1
1
= R1
2
= [0, θ/2].
45
Cournot Duopoly
G = (N, (Si)i∈N , (ui)i∈N)
N = {1, 2}
Si = [0, ∞)
u1(q1, q2) = q1(κ − q1 − q2) − q1c1 = (κ − c1)q1 − q2
1
− q1q2
u2(q1, q2) = q2(κ − q2 − q1) − q2c2 = (κ − c2)q2 − q2
2
− q2q1
Assume for simplicity that c1 = c2 = c and denote θ = κ − c.
Now, in G1
Rat
, we still have that q1 = (θ − q2)/2 is the best response to
q2, and q2 = (θ − q1)/2 the best resp. to q1
Since q2 ∈ R1
2
= [0, θ/2], we obtain that q1 is never best response iff
q1 ∈ [0, θ/4)
Similarly q2 is never best response iff q2 ∈ [0, θ/4)
Thus R2
1
= R2
2
= [θ/4, θ/2].
....
46
Cournot Duopoly (cont.)
G = (N, (Si)i∈N , (ui)i∈N)
N = {1, 2}
Si = [0, ∞)
u1(q1, q2) = q1(κ − q1 − q2) − q1c1 = (κ − c1)q1 − q2
1
− q1q2
u2(q1, q2) = q2(κ − q2 − q1) − q2c2 = (κ − c2)q2 − q2
2
− q2q1
Assume for simplicity that c1 = c2 = c and denote θ = κ − c.
In general, after 2k iterations we have R2k
i
= R2k
i
= [ k , rk ] where
rk = (θ − k−1)/2 for k ≥ 1
k = (θ − rk )/2 for k ≥ 1 and 0 = 0
Solving the recurrence we obtain
k = θ/3 − 1
4
k
θ/3
rk = θ/3 + 1
4
k−1
θ/6
Hence, limk→∞ k = limk→∞ rk = θ/3 and thus (θ/3, θ/3) is the only
rationalizable equilibrium. 47
Cournot Duopoly (cont.)
G = (N, (Si)i∈N , (ui)i∈N)
N = {1, 2}
Si = [0, ∞)
u1(q1, q2) = q1(κ − q1 − q2) − q1c1 = (κ − c1)q1 − q2
1
− q1q2
u2(q1, q2) = q2(κ − q2 − q1) − q2c2 = (κ − c2)q2 − q2
2
− q2q1
Assume for simplicity that c1 = c2 = c and denote θ = κ − c.
Are qi = θ/3 Pareto optimal? NO!
u1(θ/3, θ/3) = u2(θ/3, θ/3) = θ2
/9
but
u1(θ/4, θ/4) = u2(θ/4, θ/4) = θ2
/8
48
IESDS vs Rationalizability in Pure Strategies
Theorem 15
Assume that S is ﬁnite. Then for all k we have that Rk
i
⊆ Dk
i
. That is,
in particular, all rationalizable strategies survive IESDS.
The opposite inclusion does not have to be true in pure strategies:
X Y
A 1, 1 1, 1
B 2, 1 0, 1
C 0, 1 2, 1
Recall that A is never best response but is strictly dominated by
neither B, nor C. That is, A survives IESDS but is not rationalizable.
49
Proof of Theorem 15
By induction on k. For k = 0 we have that R0
i
= Si = D0
i
by deﬁnition.
Assume that Rk
i
⊆ Dk
i
for some k ≥ 0 and prove that Rk+1
i
⊆ Dk+1
i
.
Let si ∈ Rk+1
i
. Then there must be s−i ∈ Rk
−i
such that
si is a best response to s−i in Gk
Rat
(This follows from the fact that si has not been eliminated in Gk
Rat
.)
But then si is a best response to s−i in Gk−1
Rat
as well!
Indeed, let si
be a best response to s−i in Gk−1
Rat
. Then si
∈ Rk
i
since si
is not
eliminated in Gk−1
Rat
. But then ui(si, s−i) ≥ ui(si
, s−i) since si is a best response
to s−i in Gk
Rat
. Thus si is a best response to s−i in Gk−1
Rat
.
By the same reason, si is a best response to s−i in Gk−2
Rat
.
By the same reason, si is a best response to s−i in Gk−3
Rat
.
· · ·
By the same reason, si is a best response to s−i in G0
Rat
= G.
However, then si is a best response to s−i in Gk
DS
.
(This follows from the fact that the “best response” relationship of si and s−i is
preserved by removing arbitrarily many other strategies.)
Thus si is not strictly dominated in Gk
DS
and si ∈ Dk+1
i
.
50
Pinning Down Beliefs – Nash Equilibria
Criticism of previous approaches:
Strictly dominant strategy equilibria often do not exist
IESDS and rationalizability may not remove any strategies
Typical example is Battle of Sexes:
O F
O 2, 1 0, 0
F 0, 0 1, 2
Here all strategies are equally reasonable according to the above
concepts.
But are all strategy proﬁles really equally reasonable?
51
Pinning Down Beliefs – Nash Equilibria
O F
O 2, 1 0, 0
F 0, 0 1, 2
Assume that each player has a belief about strategies of other
players.
By Claim 3, each player plays a best response to his beliefs.
Is (O, F) as reasonable as (O, O) in this respect?
Note that if player 1 believes that player 2 plays O, then playing O is
reasonable, and if player 2 believes that player 1 plays F, then playing
F is reasonable. But such beliefs cannot be correct together!
(O, O) can be obtained as a proﬁle where each player plays the best
response to his belief and the beliefs are correct.
52
Nash Equilibrium
Nash equilibrium can be deﬁned as a set of beliefs (one for each
player) and a strategy proﬁle in which every player plays a best
response to his belief and each strategy of each player is consistent
with beliefs of his opponents.
A usual deﬁnition is following:
Deﬁnition 16
A pure-strategy proﬁle s∗
= (s∗
1
, . . . , s∗
n) ∈ S is a (pure) Nash
equilibrium if s∗
i
is a best response to s∗
−i
for each i ∈ N, that is
ui(s∗
i , s∗
−i) ≥ ui(si, s∗
−i) for all si ∈ Si and all i ∈ N
Note that this deﬁnition is equivalent to the previous one in the sense that s∗
−i
may be considered as the (consistent) belief of player i to which he plays a
best response s∗
i
53
Nash Equilibria Examples
In the Prisoner’s dilemma:
C S
C −5, −5 0, −20
S −20, 0 −1, −1
(C, C) is the only Nash equilibrium.
In the Battle of Sexes:
O F
O 2, 1 0, 0
F 0, 0 1, 2
only (O, O) and (F, F) are Nash equilibria.
In Cournot Duopoly, (θ/3, θ/3) is the only Nash equilibrium.
(Best response relations: q1 = (θ − q2)/2 and q2 = (θ − q1)/2 are both
satisﬁed only by q1 = q2 = θ/3)
54
Example: Stag Hunt
Story:
Two (in some versions more than two) hunters, players 1 and 2,
can each choose to hunt
stag (S) = a large tasty meal
hare (H) = also tasty but small
Hunting stag is much more demanding and forces of both
players need to be joined (hare can be hunted individually)
Strategy-form game model: N = {1, 2}, S1 = S2 = {S, H}, the payoff:
S H
S 5, 5 0, 3
H 3, 0 3, 3
Two NE: (S, S), and (H, H), where the former Pareto dominates the
latter! Which one is more reasonable?
55
Example: Stag Hunt
Strategy-form game model: N = {1, 2}, S1 = S2 = {S, H}, the payoff:
S H
S 5, 5 0, 3
H 3, 0 3, 3
Two NE: (S, S), and (H, H), where the former Pareto dominates the
latter! Which one is more reasonable?
If each player believes that the other one will go for hare, then (H, H)
is a reasonable outcome ⇒ a society of individualists who do not
cooperate at all.
If each player believes that the other will cooperate, then this
anticipation is self-fulﬁlling and results in what can be called
a cooperative society.
This is supposed to explain that in real world there are societies that have
similar endowments, access to technology and physical environment but
have very different achievements, all because of self-fulﬁlling beliefs (or
norms of behavior).
56
Example: Stag Hunt
Strategy-form game model: N = {1, 2}, S1 = S2 = {S, H}, the payoff:
S H
S 5, 5 0, 3
H 3, 0 3, 3
Two NE: (S, S), and (H, H), where the former Pareto dominates the
latter! Which one is more reasonable?
Another point of view: (H, H) is less risky
Minimum secured by playing S is 0 as opposed to 3 by playing H
(We will get to this minimax principle later)
So it seems to be rational to expect (H, H) (?)
57
Nash Equilibria vs Previous Concepts
Theorem 17
1. If s∗
is a strictly dominant strategy equilibrium, then it is the
unique Nash equilibrium.
2. Each Nash equilibrium is rationalizable and survives IESDS.
3. If S is ﬁnite, neither rationalizability, nor IESDS creates new
Nash equilibria.
Proof: Homework!
Corollary 18
Assume that S is ﬁnite. If rationalizability or IESDS result in a unique
strategy proﬁle, then this proﬁle is a Nash equilibrium.
58
Interpretations of Nash Equilibria
Except the two deﬁnitions, usual interpretations are following:
When the goal is to give advice to all of the players in a
game (i.e., to advise each player what strategy to choose),
any advice that was not an equilibrium would have the
unsettling property that there would always be some player
for whom the advice was bad, in the sense that, if all other
players followed the parts of the advice directed to them, it
would be better for some player to do differently than he
was advised. If the advice is an equilibrium, however, this
will not be the case, because the advice to each player is
the best response to the advice given to the other players.
When the goal is prediction rather than prescription, a
Nash equilibrium can also be interpreted as a potential
stable point of a dynamic adjustment process in which
individuals adjust their behavior to that of the other players
in the game, searching for strategy choices that will give
them better results.
59
Static Games of Complete Information
Mixed Strategies
60
Let’s Mix It
As pointed out before, neither of the solution concepts has to exist in
pure strategies
Example: Rock-Paper-sCissors
R P C
R 0, 0 −1, 1 1, −1
P 1, −1 0, 0 −1, 1
C −1, 1 1, −1 0, 0
There are no strictly dominant pure strategies
No strategy is strictly dominated (IESDS removes nothing)
Each strategy is a best response to some strategy of the opponent
(rationalizability removes nothing)
No pure Nash equilibria: No pure strategy proﬁle allows each player
to play a best response to the strategy of the other player
How to solve this?
Let the players randomize their choice of pure strategies ....
61
Probability Distributions
Deﬁnition 19
Let A be a ﬁnite set. A probability distribution over A is a function
σ : A → [0, 1] such that a∈A σ(a) = 1.
We denote by ∆(A) the set of all probability distributions over A.
We denote by supp(σ) the support of σ, that is the set of all a ∈ A
satisfying σ(a) > 0.
Example 20
Consider A = {a, b, c} and a function σ : A → [0, 1] such that
σ(a) = 1
4 , σ(b) = 3
4 , and σ(c) = 0. Then σ ∈ ∆(A) and
supp(σ) = {a, b}.
62
Mixed Strategies
Let us ﬁx a strategic-form game G = (N, (Si)i∈N , (ui)i∈N).
From now on, assume that all Si are ﬁnite!
Deﬁnition 21
A mixed strategy of player i is a probability distribution σ ∈ ∆(Si) over
Si. We denote by Σi = ∆(Si) the set of all mixed strategies of player i.
We deﬁne Σ := Σ1 × · · · × Σn, the set of all mixed strategy proﬁles.
Recall that by Σ−i we denote the set Σ1 × · · · Σi−1 × Σi+1 × · · · × Σn
Elements of Σ−i are denoted by σ−i = (σ1, . . . , σi−1, σi+1, . . . , σn).
We identify each si ∈ Si with a mixed strategy σ that assigns
probability one to si (and zero to other pure strategies).
For example, in rock-paper-scissors, the pure strategy R corresponds
to σi which satisﬁes σi(X) =



1 X = R
0 otherwise
63
Mixed Strategies
Sometimes we assume Si = {1, . . . , mi}, here mi ∈ {1, 2, . . .}, for
all i ∈ N.
Then every mixed strategy σi is a vector
σi = (σi(1), . . . , σi(mi)) ∈ [0, 1]mi so that
σi(1) + · · · + σi(mi) = 1
64
Mixed Strategy Proﬁles
Let σ = (σ1, . . . , σn) be a mixed strategy proﬁle.
Intuitively, we assume that each player i randomly chooses his pure
strategy according to σi and independently of his opponents.
Thus for s = (s1, . . . , sn) ∈ S = S1 × · · · × Sn we have that
σ(s) :=
n
i=1
σi(si)
is the probability that the players choose the pure strategy proﬁle s
according to the mixed strategy proﬁle σ, and
σ−i(s−i) :=
n
k i
σk (sk )
is the probability that the opponents of player i choose s−i ∈ S−i when
they play according to the mixed strategy proﬁle σ−i ∈ Σ−i.
(We abuse notation a bit here: σ denotes two things, a vector of mixed
strategies as well as a probability distribution on S (the same for σ−i)
65
Mixed Strategies – Example
R P C
R 0, 0 −1, 1 1, −1
P 1, −1 0, 0 −1, 1
C −1, 1 1, −1 0, 0
An example of a mixed strategy σ1: σ1(R) = 1
2 , σ1(P) = 1
3 , σ1(C) = 1
6 .
Sometimes we write σ1 as (1
2 (R), 1
3 (P), 1
6 (C)), or only (1
2 , 1
3 , 1
6 ) if the
order of pure strategies is ﬁxed.
Consider a mixed strategy proﬁle (σ1, σ2) where
σ1 = (1
2 (R), 1
3 (P), 1
6 (C)) and σ2 = (1
3 (R), 2
3 (P), 0(C)).
Then the probability σ(R, P) that the pure strategy proﬁle (R, P) will
be chosen by players playing the mixed proﬁle (σ1, σ2) is
σ1(R) · σ2(P) =
1
2
·
2
3
=
1
3
66
Expected Payoff
... but now what is the suitable notion of payoff?
Deﬁnition 22
The expected payoff of player i under a mixed strategy proﬁle σ ∈ Σ is
ui(σ) :=
s∈S
σ(s)ui(s)

=
s∈S
n
k=1
σk (sk )ui(s)


I.e., it is the "weighted average" of what player i wins under each pure
strategy proﬁle s, weighted by the probability of that proﬁle.
Assumption: Every rational player strives to maximize his own
expected payoff.
(This assumption is not always completely convincing ...)
67
Expected Payoff – Example
Matching Pennies:
H T
H 1, −1 −1, 1
T −1, 1 1, −1
Each player secretly turns a penny to heads or tails, and then they reveal
their choices simultaneously. If the pennies match, player 1 (row) wins, if they
do not match, player 2 (column) wins.
Consider σ1 = (1
3 (H), 2
3 (T)) and σ2 = (1
4 (H), 3
4 (T))
u1(σ1, σ2) =
(X,Y)∈{H,T}2
σ1(X)σ2(Y)u1(X, Y)
=
1
3
1
4
1 +
1
3
3
4
(−1) +
2
3
1
4
(−1) +
2
3
3
4
1 =
1
6
u2(σ1, σ2) =
(X,Y)∈{H,T}2
σ1(X)σ2(Y)u2(X, Y)
=
1
3
1
4
(−1) +
1
3
3
4
1 +
2
3
1
4
1 +
2
3
3
4
(−1) = −
1
6 68
"Decomposition" of Expected Payoff
Consider the matching pennies example from the previous slide:
H T
H 1, −1 −1, 1
T −1, 1 1, −1
together with some mixed
strategies σ1 and σ2.
We prove the following important property of the expected payoff:
u1(σ1, σ2) =
X∈{H,T}
σ1(X)u1(X, σ2)
An intuition behind this equality is following:
u1(σ1, σ2) is the expected payoff of player 1 in the following experiment:
Both players simultaneously and independently choose their pure
strategies X, Y according to σ1, σ2, resp., and then player 1 collects his
payoff u1(X, Y).
X∈{H,T} σ1(X)u1(X, σ2) is the expected payoff of player 1 in the
following: Player 1 chooses his pure strategy X and then uses it against
the mixed strategy σ2 of player 2. Then player 2 chooses Y according to
σ2 independently of X, and player 1 collects the payoff u1(X, Y).
As Y does not depend on X in neither experiment, we obtain the above
equality of expected payoffs. 69
"Decomposition" of Expected Payoff
Consider the matching pennies example from the previous slide:
H T
H 1, −1 −1, 1
T −1, 1 1, −1
together with some mixed
strategies σ1 and σ2.
A formal proof is straightforward:
u1(σ1, σ2) =
(X,Y)∈{H,T}2
σ1(X)σ2(Y)u1(X, Y)
=
X∈{H,T} Y∈{H,T}
σ1(X)σ2(Y)u1(X, Y)
=
X∈{H,T}
σ1(X)
Y∈{H,T}
σ2(Y)u1(X, Y)
=
X∈{H,T}
σ1(X)u1(X, σ2)
(In the last equality we used the fact that X is identiﬁed with a mixed strategy
assigning one to X.)
70
"Decomposition" of Expected Payoff
Consider the matching pennies example from the previous slide:
H T
H 1, −1 −1, 1
T −1, 1 1, −1
together with some mixed
strategies σ1 and σ2.
Similarly,
u1(σ1, σ2) =
(X,Y)∈{H,T}2
σ1(X)σ2(Y)u1(X, Y)
=
X∈{H,T} Y∈{H,T}
σ1(X)σ2(Y)u1(X, Y)
=
Y∈{H,T} X∈{H,T}
σ1(X)σ2(Y)u1(X, Y)
=
Y∈{H,T}
σ2(Y)
X∈{H,T}
σ1(X)u1(X, Y)
=
Y∈{H,T}
σ2(Y)u1(σ1, Y)
71
Expected Payoff – "Decomposition" in General
Lemma 23
For every mixed strategy proﬁle σ ∈ Σ and every k ∈ N we have
ui(σ) =
sk ∈Sk
σk (sk ) · ui(sk , σ−k ) =
s−k ∈S−k
σ−k (s−k ) · ui(σk , s−k )
Proof:
ui(σ) =
s∈S
σ(s)ui(s) =
s∈S
n
=1
σ (s )ui(s)
=
s∈S
σk (sk )
n
k
σ (s )ui(s)
=
sk ∈Sk s−k ∈S−k
σk (sk )
n
k
σ (s )ui(sk , s−k )
=
sk ∈Sk s−k ∈S−k
σk (sk )σ−k (s−k )ui(sk , s−k )
72
Proof of Lemma 23 (cont.)
The ﬁrst equality:
ui(σ) =
sk ∈Sk s−k ∈S−k
σk (sk )σ−k (s−k )ui(sk , s−k )
=
sk ∈Sk
σk (sk )
s−k ∈S−k
σ−k (s−k )ui(sk , s−k )
=
sk ∈Sk
σk (sk )ui(sk , σ−k )
The second equality:
ui(σ) =
sk ∈Sk s−k ∈S−k
σk (sk )σ−k (s−k )ui(sk , s−k )
=
s−k ∈S−k sk ∈Sk
σk (sk )σ−k (s−k )ui(sk , s−k )
=
s−k ∈S−k
σ−k (s−k )
sk ∈Sk
σk (sk )ui(sk , s−k )
=
s−k ∈S−k
σ−k (s−k )ui(σk , s−k )
73
Expected Payoff – Pure Strategy Bounds
Corollary 24
For all i, k ∈ N and σ ∈ Σ we have that
minsk ∈Sk
ui(sk , σ−k ) ≤ ui(σ) ≤ maxsk ∈Sk
ui(sk , σ−k )
mins−k ∈S−k
ui(σk , s−k ) ≤ ui(σ) ≤ maxs−k ∈S−k
ui(σk , s−k )
Proof.
We prove ui(σ) ≤ maxsk ∈Sk
ui(sk , σ−k ) the rest is similar. Deﬁne
B := maxsk ∈Sk
ui(sk , σ−k ). Then
ui(σ) =
sk ∈Sk
σk (sk ) · ui(sk , σ−k )
≤
sk ∈Sk
σk (sk ) · B
= B
74
Solution Concepts
We revisit the following solution concepts in mixed strategies:
strict dominant strategy equilibrium
IESDS equilibrium
rationalizable equilibria
Nash equilibria
From now on, when I say a strategy I implicitly mean a
mixed strategy.
In order to deal with efﬁciency issues we assume that the size of the game G
is deﬁned by |G| := |N| + i∈N |Si| + i∈N |ui| where |ui| = s∈S |ui(s)| and
|ui(s)| is the length of a binary encoding of ui(s) (we assume that rational
numbers are encoded as quotients of two binary integers)
Note that, in particular, |G| > |S|.
75
Strict Dominance in Mixed Strategies
Deﬁnition 25
Let σi, σi
∈ Σi be (mixed) strategies of player i. Then σi
is
strictly dominated by σi (write σi
σi) if
ui(σi, σ−i) > ui(σi , σ−i) for all σ−i ∈ Σ−i
Example 26
X Y
A 3 0
B 0 3
C 1 1
Is there a strictly dominated strategy?
Question: Is there a game with at least one strictly dominated
strategy but without strictly dominated pure strategies?
76
Strictly Dominant Strategy Equilibrium
Deﬁnition 27
σi ∈ Σi is strictly dominant if every other mixed strategy of player i is
strictly dominated by σi.
Deﬁnition 28
A strategy proﬁle σ ∈ Σ is a strictly dominant strategy equilibrium if
σi ∈ Σi is strictly dominant for all i ∈ N.
Proposition 2
If the strictly dominant strategy equilibrium exists, it is unique, all its
strategies are pure, and rational players will play it.
Proof.
Let σ∗
= (σ∗
1
, . . . , σ∗
n) ∈ Σi be the strictly dominant strategy equilibrium.
By Corollary 24, for every i ∈ N and σ−i ∈ Σ−i, there must exist si ∈ Si
such that ui(σ∗
i
, σ−i) ≤ ui(si, σ−i).
But then σ∗
i
= si since σ∗
i
is strictly dominant.
77
Computing Strictly Dominant Strategy Equilibrium
How to decide whether there is a strictly dominant strategy
equilibrium s = (s1, . . . , sn) ∈ S ?
I.e. whether for a given si ∈ Si, all σi ∈ Σi {si} and all σ−i ∈ Σ−i :
ui(si, σ−i) > ui(σi, σ−i)
There are some serious issues here:
Obviously there are uncountably many possible σi and σ−i.
ui(σi, σ−i) is nonlinear, and for more that two players even ui(si, σ−i) is
nonlinear in probabilities assigned to pure strategies.
78
Computing Strictly Dominant Strategy Equilibrium
First, we prove the following useful proposition using Lemma 23:
Lemma 29
σi
strictly dominates σi iff for all pure strategy proﬁles s−i ∈ S−i:
ui(σi , s−i) > ui(σi, s−i) (1)
Proof.
‘⇒’ direction is trivial, let us prove ‘⇐’. Assume that (1) is true for all
pure strategy proﬁles s−i ∈ S−i. Then, by Lemma 23,
ui(σi, σ−i) =
s−i ∈S−i
σ−i(s−i)ui(σi, s−i) <
s−i ∈S−i
σ−i(si)ui(σi , s−i) = ui(σi , σ−i)
holds for all mixed strategy proﬁles σ−i ∈ Σ−i.
In other words, it sufﬁces to check the strict dominance only with
respect to all pure proﬁles of opponents.
79
Computing Strictly Dominant Strategy Equilibrium
How to decide whether for a given si ∈ Si, all σi ∈ Σi {si} and all
s−i ∈ S−i we have ui(si, s−i) > ui(σi, s−i).
Lemma 30
ui(si, s−i) > ui(σi, s−i) for all σi ∈ Σi {si} and all s−i ∈ S−i iff
ui(si, s−i) > ui(si
, s−i) for all si
∈ Si {si} and all s−i ∈ S−i.
Proof.
‘⇒’ direction is trivial, let us prove ‘⇐’. Assume ui(si, s−i) > ui(si
, s−i)
for all si
∈ Si {si} and all s−i ∈ S−i. Given σi ∈ Σi {si}, we have by
Lemma 23,
ui(σi, s−i) =
si
∈Si
σi(si )ui(si , s−i) <
si
∈Si
σi(si )ui(si, s−i) = ui(si, s−i)
The inequality follows from our assumption and the fact that σi(si
) > 0
for at least one si
si (due to σi ∈ Σi {si}).
Thus it sufﬁces to check whether ui(si, s−i) > ui(si
, s−i) for all si
∈ Si
and all s−i ∈ S−i. This can easily be done in time polynomial w.r.t. |G|.
80
IESDS in Mixed Strategies
Deﬁne a sequence D0
i
, D1
i
, D2
i
, . . . of strategy sets of player i.
(Denote by Gk
DS
the game obtained from G by restricting the pure strategy
sets to Dk
i
, i ∈ N.)
1. Initialize k = 0 and D0
i
= Si for each i ∈ N.
2. For all players i ∈ N: Let Dk+1
i
be the set of all pure strategies of
Dk
i
that are not strictly dominated in Gk
DS
by mixed strategies.
3. Let k := k + 1 and go to 2.
We say that si ∈ Si survives IESDS if si ∈ Dk
i
for all k = 0, 1, 2, . . .
Deﬁnition 31
A strategy proﬁle s = (s1, . . . , sn) ∈ S is an IESDS equilibrium if each
si survives IESDS.
81
IESDS – Algorithm
Note that in step 2 it is not sufﬁcient to consider pure strategies.
Consider the following zero sum game:
X Y
A 3 0
B 0 3
C 1 1
C is strictly dominated by (σ1(A), σ1(B), σ1(C)) = (1
2 , 1
2 , 0) but no
strategy is strictly dominated in pure strategies.
82
IESDS – Algorithm
However, there are uncountably many mixed strategies that may
dominate a given pure strategy ...
But ui(σ) = ui(σ1, . . . , σn) is linear in each σk (if σ−k is kept ﬁxed)!
Indeed, assuming w.l.o.g. that Sk = {1, . . . , mk },
ui(σ) =
sk ∈Sk
σk (sk ) · ui(sk , σ−k ) =
mk
=1
σk ( ) · ui( , σ−k )
is the scalar product of the vector σk = (σk (1), . . . , σk (mk )) with the vector
(ui(1, σ−k ), . . . , ui(mk , σ−k )), which is linear.
So to decide strict dominance, we use linear programming ...
83
Intermezzo: Linear Programming
Linear programming is a technique for optimization of a linear
objective function, subject to linear (non-strict) inequality constraints.
Formally, a linear program in so called canonical form looks like this:
maximize
m
j=1
cjxj
subject to
m
j=1
aijxj ≤ bi 1 ≤ i ≤ n
xj ≥ 0 1 ≤ j ≤ m
(objective function)
(constraints)
Here aij, bk and cj are real numbers and xj’s are real variables.
A feasible solution is an assignment of real numbers to the variables
xj, 1 ≤ j ≤ m, so that the constraints are satisﬁed.
An optimal solution is a feasible solution which maximizes
the objective function m
j=1 cjxj.
84
Intermezzo: Complexity of Linear Programming
We assume that coefﬁcients aij, bk and cj are encoded in binary
(more precisely, as fractions of two integers encoded in binary).
Theorem 32 (Khachiyan, Doklady Akademii Nauk SSSR, 1979)
There is an algorithm which for any linear program computes an
optimal solution in polynomial time.
The algorithm uses so called ellipsoid method.
In practice, the Khachiyan’s is not used. Usually simplex algorithm
is used even though its theoretical complexity is exponential.
There is also a polynomial time algorithm (by Karmarkar) which has
better complexity upper bounds than the Khachiyan’s and sometimes
works even better than the simplex.
There exist several advanced linear programming solvers (usually
parts of larger optimization packages) implementing various
heuristics for solving large scale problems, sensitivity analysis, etc.
For more info see
http://en.wikipedia.org/wiki/Linear_programming#Solvers_and_scripting_.28programming.29_languages
85
IESDS Algorithm – Strict Dominance Step
So how do we use linear programming to decide strict dominance in
step 2 of IESDS procedure?
I.e. whether for a given si there exists σi such that for all σ−i we have
ui(σi, σ−i) > ui(si, σ−i)
Recall that by Lemma 29 we have that σi strictly dominates si iff for
all pure strategy proﬁles s−i ∈ S−i:
ui(σi, s−i) > ui(si, s−i)
In other words, it sufﬁces to check the strict dominance only with
respect to all pure proﬁles of opponents.
86
IESDS Algorithm – Strict Dominance Step
Recall that ui(σi, s−i) = si
∈Si
σi(si
)ui(si
, s−i).
So to decide whether si ∈ Si is strictly dominated by some mixed
strategy σi, it sufﬁces to solve the following system:
si
∈Si
xsi
· ui(si , s−i) > ui(si, s−i) s−i ∈ S−i
xsi
≥ 0 si ∈ Si
si
∈Si
xsi
= 1
(Here each variable xsi
corresponds to the probability σi(si
) assigned
by the strictly dominant strategy σi to si
)
Unfortunately, this is a "strict linear program" ... How to deal with
the strict inequality?
87
IESDS Algorithm – Complexity
Introduce a new variable y to be maximized under the following
constraints:
si
∈Si
xsi
· ui(si , s−i) ≥ ui(si, s−i) + y s−i ∈ S−i
xsi
≥ 0 si ∈ Si
si
∈Si
xsi
= 1
y ≥ 0
Now si is strictly dominated iff a solution maximizing y satisﬁes y > 0
The size of the above program is polynomial in |G|.
So the step 2 of IESDS can be executed in polynomial time.
As every iteration of IESDS removes at least one pure strategy,
IESDS runs in time polynomial in |G|.
88
IESDS in Mixed Strategie – Example
X Y
A 3 0
B 0 3
C 1 1
Let us have a look at the ﬁrst iteration of IESDS.
Observe that A, B are not strictly dominated by any mixed strategy.
Let us construct the linear program for deciding whether C is strictly
dominated: The program maximizes y under the following constraints:
3xA + 0xB + xC ≥ 1 + y
0xA + 3xB + xC ≥ 1 + y
xA , xB , xC ≥ 0
xA + xB + xC = 1
y ≥ 0
The maximum y = 1
2
is attained at xA = 1
2
and xB = 1
2
.
89
Best Response
Deﬁnition 33
A strategy σi ∈ Σi of player i is a best response to a strategy proﬁle
σ−i ∈ Σ−i of his opponents if
ui(σi, σ−i) ≥ ui(σi , σ−i) for all σi ∈ Σi
We denote by BRi(σ−i) ⊆ Σi the set of all best responses of player i to
the strategy proﬁle of opponents σ−i ∈ Σ−i.
90
Best Response – Example
Consider a game with the following payoffs of player 1:
X Y
A 2 0
B 0 2
C 1 1
Player 1 (row) plays σ1 = (a(A), b(B), c(C)).
Player 2 (column) plays (q(X), (1 − q)(Y)) (we write just q).
Compute BR1(q).
91
Rationalizability in Mixed Strategies (Two Players)
For simplicity, we temporarily switch to two-player setting N = {1, 2}.
Deﬁnition 34
A (mixed) belief of player i ∈ {1, 2} is a mixed strategy σ−i of his
opponent.
(A general deﬁnition works with so called correlated beliefs that are arbitrary
distributions on S−i, the notion of the expected payoff needs to be adjusted,
we are not going in this direction ....)
Assumption: Any rational player with a belief σ−i always plays a best
response to σ−i.
Deﬁnition 35
A strategy σi ∈ Σi of player i ∈ {1, 2} is never best response if it is not
a best response to any belief σ−i.
No rational player plays a strategy that is never best response.
92
Rationalizability in Mixed Strategies (Two Players)
Deﬁne a sequence R0
i
, R1
i
, R2
i
, . . . of strategy sets of player i.
(Denote by Gk
Rat
the game obtained from G by restricting the pure strategy
sets to Rk
i
, i ∈ N.)
1. Initialize k = 0 and R0
i
= Si for each i ∈ N.
2. For all players i ∈ N: Let Rk+1
i
be the set of all strategies of Rk
i
that are best responses to some (mixed) beliefs in Gk
Rat
.
3. Let k := k + 1 and go to 2.
We say that si ∈ Si is rationalizable if si ∈ Rk
i
for all k = 0, 1, 2, . . .
Deﬁnition 36
A strategy proﬁle s = (s1, . . . , sn) ∈ S is a rationalizable equilibrium if
each si is rationalizable.
93
Rationalizability vs IESDS (Two Players)
X Y
A 3 0
B 0 3
C 1 1
Player 1 (row) plays
σ1 = (a(A), b(B), c(C))
player 2 (column) plays
(q(X), (1 − q)(Y)) (we write just q)
What strategies of player 1 are never best responses?
What strategies of player 1 are strictly dominated?
Observation: The set of strictly dominated strategies coincides with
the set of never best responses!
... and this holds in general for two player games:
Theorem 37
Assume N = {1, 2}. A pure strategy si is never best response to any
belief σ−i ∈ Σ−i iff si is strictly dominated by a strategy σi ∈ Σi.
It follows that a strategy of Si survives IESDS iff it is rationalizable.
(The theorem is true also for an arbitrary number of players but correlated
beliefs need to be used.)
94
Mixed Nash Equilibrium
Deﬁnition 38
A mixed-strategy proﬁle σ∗ = (σ∗
1
, . . . , σ∗
n) ∈ Σ is a (mixed) Nash
equilibrium if σ∗
i
is a best response to σ∗
−i
for each i ∈ N, that is
ui(σ∗
i , σ∗
−i) ≥ ui(σi, σ∗
−i) for all σi ∈ Σi and all i ∈ N
An interpretation: each σ∗
−i
can be seen as a belief of player i against which
he plays a best response σ∗
i
.
Given a mixed strategy proﬁle of opponents σ−i ∈ Σ−i, we denote by
BRi(σ−i) the set of all σi ∈ Σi that are best responses to σ−i.
Then σ∗
is a Nash equilibrium iff σ∗
i
∈ BRi(σ∗
−i
) for all i ∈ N.
Theorem 39 (Nash 1950)
Every ﬁnite game in strategic form has a Nash equilibrium.
This is THE fundamental theorem of game theory.
95
Example: Matching Pennies
H T
H 1, −1 −1, 1
T −1, 1 1, −1
Player 1 (row) plays (p(H), (1 − p)(T)) (we write just p) and player 2
(column) plays (q(H), (1 − q)(T)) (we write q).
Compute all Nash equilibria.
What are the expected payoffs of playing pure strategies for player 1?
u1(H, q) = 2q − 1 and u1(T, q) = 1 − 2q
Then
u1(p, q) = pu1(H, q) + (1 − p)u1(T, q) = p(2q − 1) + (1 − p)(1 − 2q).
We obtain the best-response correspondence BR1:
BR1(q) =



p = 0 if q < 1
2
p ∈ [0, 1] if q = 1
2
p = 1 if q > 1
2
96
Example: Matching Pennies
H T
H 1, −1 −1, 1
T −1, 1 1, −1
Player 1 (row) plays (p(H), (1 − p)(T)) (we write just p) and player 2
(column) plays (q(H), (1 − q)(T)) (we write q).
Compute all Nash equilibria.
Similarly for player 2 :
u2(p, H) = 1 − 2p and u2(p, T) = 2p − 1
u2(p, q) = qu2(p, H) + (1 − q)u2(p, T) = q(1 − 2p) + (1 − q)(2p − 1)
We obtain best-response relation BR2:
BR2(p) =



q = 1 if p < 1
2
q ∈ [0, 1] if p = 1
2
q = 0 if p > 1
2
The only "intersection" of BR1 and BR2 is the only Nash equilibrium
σ1 = σ2 = (1
2 , 1
2 ).
97
Static Games of Complete Information
Mixed Strategies
Computing Nash Equilibria – Support Enumeration
98
Computing Mixed Nash Equilibria
Lemma 40
σ∗
= (σ∗
1
, . . . , σ∗
n) ∈ Σ is a Nash equilibrium iff there exist
w1, . . . , wn ∈ R such that the following holds:
For all i ∈ N and all si ∈ supp(σ∗
i
) we have ui(si, σ∗
−i
) = wi.
For all i ∈ N and all si supp(σ∗
i
) we have ui(si, σ∗
−i
) ≤ wi.
Here, the right hand side implies ui(σ∗
) = wi.
Proof.
The fact that the right hand side implies ui(σ∗
) = wi follows
immediately from Lemma 23:
ui(σ∗
) =
si ∈Si
σ∗
(si)ui(si, σ∗
−i) =
si ∈supp(σ∗
i
)
σ∗
(si)ui(si, σ∗
−i)
=
si ∈supp(σ∗
i
)
σ∗
(si)wi = wi
si ∈supp(σ∗
i
)
σ∗
(si) = wi
99
Computing Mixed Nash Equilibria
Lemma 41
σ∗
= (σ∗
1
, . . . , σ∗
n) ∈ Σ is a Nash equilibrium iff there exist
w1, . . . , wn ∈ R such that the following holds:
For all i ∈ N and all si ∈ supp(σ∗
i
) we have ui(si, σ∗
−i
) = wi.
For all i ∈ N and all si supp(σ∗
i
) we have ui(si, σ∗
−i
) ≤ wi.
Here, the right hand side implies ui(σ∗
) = wi.
Proof. (Cont.)
"⇐": Use the ﬁrst equality of Lemma 23 to obtain for every i ∈ N and
every σi
∈ Σi
ui(σi , σ∗
−i) =
si ∈Si
σi (si)ui(si, σ∗
−i) ≤
≤
si ∈Si
σi (si)wi =
si ∈Si
σi (si)ui(σ∗
) = ui(σ∗
)
Thus σ∗
is a Nash equilibrium.
100
Computing Mixed Nash Equilibria
Lemma 42
σ∗
= (σ∗
1
, . . . , σ∗
n) ∈ Σ is a Nash equilibrium iff there exist
w1, . . . , wn ∈ R such that the following holds:
For all i ∈ N and all si ∈ supp(σ∗
i
) we have ui(si, σ∗
−i
) = wi.
For all i ∈ N and all si supp(σ∗
i
) we have ui(si, σ∗
−i
) ≤ wi.
Here, the right hand side implies ui(σ∗
) = wi.
Proof (Cont.)
Idea for "⇒": Let wi := ui(σ∗
).
Clearly, every i ∈ N and si ∈ Si satisfy ui(si, σ∗
−i
) ≤ ui(σ∗
) = wi.
By Corollary 24, there is at least one si ∈ supp(σ∗
i
) satisfying
ui(si, σ∗
−i
) = ui(σ∗
) = wi.
Now if there is si
∈ supp(σ∗
i
) such that
ui(si , σ∗
−i) < ui(σ∗
) (= ui(si, σ∗
−i))
then increasing the probability σ∗
i
(si) and decreasing (in proportion)
σ∗
i
(si
) strictly increases of ui(σ∗
), a contradiction with σ∗
being NE.
101
Example: Matching Pennies
H T
H 1, −1 −1, 1
T −1, 1 1, −1
Player 1 (row) plays (p(H), (1 − p)(T)) (we write just p) and player 2
(column) plays (q(H), (1 − q)(T)) (we write q).
Compute all Nash equilibria.
There are no pure strategy equilibria.
There are no equilibria where only player 1 randomizes:
Indeed, assume that (p, H) is such an equilibrium. Then by
Lemma 42,
1 = u1(H, H) = u1(T, H) = −1
a contradiction. Also, (p, T) cannot be an equilibrium.
Similarly, there is no NE where only player 2 randomizes.
102
Example: Matching Pennies
H T
H 1, −1 −1, 1
T −1, 1 1, −1
Player 1 (row) plays (p(H), (1 − p)(T)) (we write just p) and player 2
(column) plays (q(H), (1 − q)(T)) (we write q).
Compute all Nash equilibria.
Assume that both players randomize, i.e., p, q ∈ (0, 1).
The expected payoffs of playing pure strategies for player 1:
u1(H, q) = 2q − 1 and u1(T, q) = 1 − 2q
Similarly for player 2 :
u2(p, H) = 1 − 2p and u1(p, T) = 2p − 1
By Lemma 42, Nash equilibria must satisfy:
2q − 1 = 1 − 2q and 1 − 2p = 2p − 1
That is p = q = 1
2 is the only Nash equilibrium.
103
Example: Battle of Sexes
O F
O 2, 1 0, 0
F 0, 0 1, 2
Player 1 (row) plays (p(O), (1 − p)(F)) (we write just p) and player 2
(column) plays (q(O), (1 − q)(F)) (we write q).
Compute all Nash equilibria.
There are two pure strategy equilibria (2, 1) and (1, 2), no Nash
equilibrium where only one player randomizes.
Now assume that
player 1 (row) plays (p(H), (1 − p)(T)) (we write just p) and
player 2 (column) plays (q(H), (1 − q)(T)) (we write q)
where p, q ∈ (0, 1).
By Lemma 42, any Nash equilibrium must satisfy:
2q = 1 − q and p = 2(1 − p)
This holds only for q = 1
3 and p = 2
3 .
104
An Algorithm?
What did we do in the previous examples?
We went through all support combinations for both players.
(pure, one player mixing, both mixing)
For each pair of supports we tried to ﬁnd equilibria in strategies
with these supports.
(in Battle of Sexes: two pure, no equilibrium with just one player
mixing, one equilibrium when both mixing)
Whenever one of the supports was non-singleton, we reduced
computation of Nash equilibria to linear equations.
105
Support Enumeration (Idea)
Recall Lemma 42: σ∗
= (σ∗
1
, . . . , σ∗
n) ∈ Σ is a Nash equilibrium iff there
exist w1, . . . , wn ∈ R such that the following holds:
For all i ∈ N and all si ∈ supp(σ∗
i
) we have ui(si, σ∗
−i
) = wi.
For all i ∈ N and all si supp(σ∗
i
) we have ui(si, σ∗
−i
) ≤ wi.
Suppose that we somehow know the supports supp(σ∗
1
), . . . , supp(σ∗
n)
for some Nash equilibrium σ∗
1
, . . . , σ∗
n (which itself is unknown to us).
Now we may consider all σ∗
i
(si)’s and all wi’s as variables and use the
above conditions to design a system of inequalities capturing Nash
equilibria with the given support sets supp(σ∗
1
), . . . , supp(σ∗
n).
106
Support Enumeration
To simplify notation, assume that for every i we have Si = {1, . . . , mi}.
Then σi(j) is the probability of the pure strategy j in the mixed strategy σi.
Fix supports suppi ⊆ Si for every i ∈ N and consider the following
system of constraints with variables
σ1(1), . . . , σ1(m1), . . . , σn(1), . . . , σn(mn), w1, . . . , wn:
1. For all i ∈ N and all k ∈ suppi we have
(ui(k, σ−i) = )
s∈S∧si =k


j i
σj(sj)


ui(s) = wi
2. For all i ∈ N and all k suppi we have
(ui(k, σ−i) = )
s∈S∧si =k


j i
σj(sj)


ui(s) ≤ wi
3. For all i ∈ N: σi(1) + · · · + σi(mi) = 1.
4. For all i ∈ N and all k ∈ suppi: σi(k) ≥ 0.
5. For all i ∈ N and all k suppi: σi(k) = 0.
107
Support Enumeration
Consider the system of constraints from the previous slide.
The following lemma follows immediately from Lemma 42.
Lemma 43
Let σ∗
∈ Σ be a strategy proﬁle.
If σ∗
is a Nash equilibrium and supp(σ∗
i
) = suppi for all i ∈ N,
then assigning σi(k) := σ∗
i
(k) and wi := ui(σ∗
) solves the system.
If σi(k) := σ∗
i
(k) and wi := ui(σ∗
) solves the system, then σ∗
is
a Nash equilibrium with supp(σ∗
i
) ⊆ suppi for all i ∈ N.
108
Support Enumeration (Two Players)
The constraints are non-linear in general, but linear for two player
games! Let us stick to two players.
How to ﬁnd supp1 and supp2? ... Just guess!
Input: A two-player strategic-form game G with strategy sets
S1 = {1, . . . , m1} and S2 = {1, . . . , m2} and rational payoffs u1, u2.
Output: A Nash equilibrium σ∗
.
Algorithm: For all possible supp1 ⊆ S1 and supp2 ⊆ S2:
Check if the corresponding system of linear constraints (from
the previous slide) has a feasible solution σ∗
, w∗
1
, . . . , w∗
n.
If so, STOP: the feasible solution σ∗
is a Nash equilibrium
satisfying ui(σ∗
) = w∗
i
.
Question: How many possible subsets supp1, supp2 are there to try?
Answer: 2(m1+m2)
So, unfortunately, the algorithm requires worst-case exponential time.
109
Remarks on Support Enumeration
The algorithm combined with Theorem 39 and properties of
linear programming imply that every ﬁnite two-player game has
a rational Nash equilibrium (furthermore, the rational numbers
have polynomial representation in binary).
The algorithm can be used to compute all Nash equilibria.
(There are algorithms for computing (a ﬁnite representation of) a set of
all feasible solutions of a given linear constraint system.)
The algorithm can be used to compute "good" equilibria.
For example, to ﬁnd a Nash equilibrium maximizing the sum of
all expected payoffs (the "social welfare") it sufﬁces to solve the
system of constraints while maximizing w1 + · · · + wn. More
precisely, the algorithm can be modiﬁed as follows:
Initialize W := −∞ (W stores the current maximum welfare)
For all possible supp1 ⊆ S1 and supp2 ⊆ S2:
Find the maximum value max( wi) of w1 + · · · + wn so that
the constraints are satisﬁable (using linear programming).
Put W := max{W, max( wi)}.
Return W.
110
Remarks on Support Enumeration (Cont.)
Similar trick works for any notion of "good" NE that can be expressed
using a linear objective function and (additional) linear constraints in
variables σi(j) and wi.
(e.g., maximize payoff of player 1, minimize payoff of player 2 and keep
probability of playing the strategy 1 below 1/2, etc.)
111
Complexity Results – (Two Players)
Theorem 44
All the following problems are NP-complete: Given a two-player game
in strategic form, does it have
1. a NE in which player 1 has utility at least a given amount v ?
2. a NE in which the sum of expected payoffs of the two players is
at least a given amount v ?
3. a NE with a support of size greater than a given number?
4. a NE whose support contains a given strategy s ?
5. a NE whose support does not contain a given strategy s ?
6. ....
Membership to NP follows from the support enumeration:
For example, for 1., it sufﬁces to guess supports supp1, supp2 and
add w1 ≥ v to the constraints; the resulting NE σ∗
satisﬁes u1(σ∗
) ≥ v.
112
Complexity Results (Two Players)
Theorem 45
All the following problems are NP-complete: Given a two-player game
in strategic form, does it have
1. a NE in which player 1 has utility at least a given amount v ?
2. a NE in which the sum of expected payoffs of the two players is
at least a given amount v ?
3. a NE with a support of size greater than a given number?
4. a NE whose support contains a given strategy s ?
5. a NE whose support does not contain a given strategy s ?
6. ....
NP-hardness can be proved using reduction from SAT
(The reduction is not difﬁcult but we are not going into it.
It is presented in "New Complexity Results about Nash Equilibria" by
V. Conitzer and T. Sandholm (pages 6–8) )
113
The Reduction (It’s Short and Sweet)
114
... But What is The Exact Complexity of Computing
Nash Equilibria in Two Player Games?
Let us concentrate on the problem of computing one Nash equilibrium
(sometimes called the sample equilibrium problem).
As the class NP consists of decision problems, it cannot be directly
used to characterize complexity of the sample equilibrium problem.
We use complexity classes of function problems such as FP, FNP, etc.
The support enumeration gives a deterministic algorithm which runs
in exponential time. Can we do better?
In what follows we show that
the sample equilibrium problem can be solved in polynomial time
for zero-sum two-player games,
(Using a beautiful characterization of all Nash equilibria)
the sample equilibrium problem belongs to the complexity class
PPAD (which is a subclass of FNP) for two-player games.
(... to be deﬁned later)
115
MaxMin
Is there a better characterization of Nash equilibria than Lemma 42 ?
Deﬁnition 46
σ∗
i
∈ Σi is a maxmin strategy of player i if
σ∗
i ∈ argmax
σi ∈Σi
min
σ−i ∈Σ−i
ui(σi, σ−i)
(Intuitively, a maxmin strategy σ∗
1
maximizes player 1’s worst-case payoff in
the situation where player 2 strives to cause the greatest harm to player 1.)
(Since ui is continuous and Σ−i compact, minσ−i ∈Σ−i
ui(σi, σ−i) is well
deﬁned and continuous on Σi, which implies that there is at least one
maxmin strategy.)
116
MaxMin
Lemma 47
σ∗
i
is maxmin iff
σ∗
i ∈ argmax
σi ∈Σi
min
s−i ∈S−i
ui(σi, s−i)
Proof.
By Corollary 24, for every σ ∈ Σ we have ui(σi, σ−i) ≥ ui(σi, s−i) for
some s−i ∈ S−i.
Thus minσ−i ∈Σ−i
ui(σi, σ−i) = mins−i ∈S−i
ui(σi, s−i). Hence,
argmax
σi ∈Σi
min
σ−i ∈Σ−i
ui(σi, σ−i) = argmax
σi ∈Σi
min
s−i ∈S−i
ui(σi, s−i)
Question: Assume a strategy proﬁle where both players play their
maxmin strategies? Does it have to be a Nash equilibrium?
117
Zero-Sum Games: von Neumann’s Theorem
Assume that G is zero sum, i.e., u1 = −u2.
Then σ∗
2
∈ Σ2 is maxmin of player 2 iff
σ∗
2 ∈ argmin
σ2∈Σ2
max
σ1∈Σ1
u1(σ1, σ2) (= argmin
σ2∈Σ2
max
s1∈S1
u1(s1, σ2))
(Intuitively, maxmin of player 2 minimizes the payoff of player 1 when player 1
plays his best responses. Such strategy of player 2 is often called minmax.)
Theorem 48 (von Neumann)
Assume a two-player zero-sum game. Then
max
σ1∈Σ1
min
σ2∈Σ2
u1(σ1, σ2) = min
σ2∈Σ2
max
σ1∈Σ1
u1(σ1, σ2)
Morever, σ∗
= (σ∗
1
, σ∗
2
) ∈ Σ is a Nash equilibrium iff both σ∗
1
and σ∗
2
are
maxmin.
So to compute a Nash equilibrium it sufﬁces to compute (arbitrary)
maxmin strategies for both players.
118
Proof of Theorem 48 (Homework)
Homework: Prove von Neumann’s Theorem in 4 easy steps:
1. Prove this inequality:
max
σ1∈Σ1
min
σ2∈Σ2
u1(σ1, σ2) ≤ min
σ2∈Σ2
max
σ1∈Σ1
u1(σ1, σ2)
2. Prove that (σ∗
1
, σ∗
2
) is a Nash equilibrium iff
min
σ2∈Σ2
u1(σ∗
1, σ2) ≥ u1(σ∗
1, σ∗
2) ≥ max
σ1∈Σ1
u1(σ1, σ∗
2)
Hint: One of the inequalities is trivial and the other one almost.
3. Use 1. and 2. together with Theorem 39 to prove
max
σ1∈Σ1
min
σ2∈Σ2
u1(σ1, σ2) ≥ min
σ2∈Σ2
max
σ1∈Σ1
u1(σ1, σ2)
4. Use the above to prove the rest of the theorem.
Hint: Use the characterization of NE from 2., do not forget that you
already have maxσ1∈Σ1
minσ2∈Σ2
u1(σ1, σ2) = minσ2∈Σ2
maxσ1∈Σ1
u1(σ1, σ2)
You may already have proved one of the implications when proving 3.
119
Zero-Sum Two-Player Games – Computing NE
Assume S1 = {1, . . . , m1} and S2 = {1, . . . , m2}.
We want to compute
σ∗
1 ∈ argmax
σ1∈Σ1
min
∈S2
u1(σ1, )
Consider a linear program with variables σ1(1), . . . , σ1(m1), v:
maximize: v
subject to:
m1
k=1
σ1(k) · u1(k, ) ≥ v = 1, . . . , m2
m1
k=1
σ1(k) = 1
σ1(k) ≥ 0 k = 1, . . . , m1
Lemma 49
σ∗
1
∈ argmaxσ1∈Σ1
min ∈S2
u1(σ1, ) iff assigning σ1(k) := σ∗
1
(k) and
v := min ∈S2
u1(σ∗
1
, ) gives an optimal solution. 120
Zero-Sum Two-Player Games – Computing NE
Summary:
We have reduced computation of NE to computation of
maxmin strategies for both players.
Maxmin strategies can be computed using linear
programming in polynomial time.
That is, Nash equilibria in zero-sum two-player games can
be computed in polynomial time.
121
Strategic-Form Games – Conclusion
We have considered static games of complete information, i.e.,
"one-shot" games where the players know exactly what game they
are playing.
We modeled such games using strategic-form games.
We have considered both pure strategy setting and mixed strategy
setting.
In both cases, we considered four solution concepts:
Strictly dominant strategies
Iterative elimination of strictly dominated strategies
Rationalizability (i.e., iterative elimination of strategies that are
never best responses)
Nash equilibria
122
Strategic-Form Games – Conclusion
In pure strategy setting:
1. Strictly dominant strategy equilibrium survives IESDS,
rationalizability and is the unique Nash equilibrium (if it exists)
2. In ﬁnite games, rationalizable equilibria survive IESDS, IESDS
preserves the set of Nash equilibria
3. In ﬁnite games, rationalizability preserves Nash equilibria
In mixed setting:
1. In ﬁnite two player games, IESDS and rationalizability coincide.
2. Strictly dominant strategy equilibrium survives IESDS
(rationalizability) and is the unique Nash equilibrium (if it exists)
3. In ﬁnite games, IESDS (rationalizability) preserves Nash
equilibria
The proofs for 2. and 3. in the mixed setting are similar to corresponding
proofs in the pure setting.
123
Algorithms
Strictly dominant strategy equilibria coincide in pure and mixed
settings, and can be computed in polynomial time.
IESDS and rationalizability can be implemented in polynomial
time in the pure setting as well as in the mixed setting
In the mixed setting, linear programming is needed to implement one
step of IESDS (rationalizability).
Nash equilibria can be computed for two-player games
in polynomial time for zero-sum games
(using von Neumann’s theorem and linear programming)
in exponential time using support enumeration
in PPAD using Lemke-Howson
124
Complexity of Nash Eq. – FNP (Roughly)
Let R be a binary relation on words (over some alphabet) that is
polynomial-time computable and polynomially balanced.
I.e., membership to R is decidable in polynomial time, and (x, y) ∈ R implies
|y| ≤ |x|k
where k is independent of x, y.
A search problem associated with R is this: Given an input x, return
a y such that (x, y) ∈ R if such y exists, and return "NO" otherwise.
Note that the problem of computing NE can be seen as a search problem R
where (x, y) ∈ R means that x is a strategic-form game and y is a Nash
equilibrium of polynomial size. (We already know from support enumeration
that there is a NE of polynomial size.)
The class of all search problems is called FNP. A class FP ⊆ FNP
contains all search problem that can be solved in polynomial time.
A search problem determined by R is polynomially reducible to
a search problem R iff there exist polynomially computable functions
f, g such that
if (x, y) ∈ R for some y, then (f(x), y ) ∈ R for some y
if (f(x), y) ∈ R , then (x, g(y)) ∈ R
if (f(x), y) R for all y, then (x, y) R for all y
125
Complexity of Nash Eq. – PPAD (Roughly)
The class PPAD is deﬁned by specifying one of its complete problems
(w.r.t. the polynomial time reduction) known as End-Of-The-Line:
Input: Two Boolean circuits (with basis ∧, ∨, ¬) S and P, each
with m input bits and m output bits, such that
P(0m
) = 0m
S(0m
).
Problem: Find an input x ∈ {0, 1}m
such that P(S(x)) x or
S(P(x)) x 0m
.
Intuition: End-Of-The-Line creates a directed graph HS,P with vertex set
{0, 1}m
and an edge from x to y whenever both y = S(x) ("successor") and
x = P(y) ("predecessor").
All vertices of HS,P have indegree and outdegree at most one. There is at
least one source (i.e., x satisfying P(x) = x, namely 0m
), so there is at least
one sink (i.e., x satisfying S(x) = x).
The goal is to ﬁnd either a source or a sink different from 0m
.
Theorem 50
The problem of computing Nash equilibria is complete for PPAD.
That is, Nash belongs to PPAD and End-Of-The-Line is polynomially
reducible to Nash.
126
Loose Ends – Modes of Dominance
Let σi, σi
∈ Σi. Then σi
is strictly dominated by σi if
ui(σi, σ−i) > ui(σi
, σ−i) for all σ−i ∈ Σ−i.
Let σi, σi
∈ Σi. Then σi
is weakly dominated by σi if
ui(σi, σ−i) ≥ ui(σi
, σ−i) for all σ−i ∈ Σ−i and there is σ−i
∈ Σ−i
such that ui(σi, σ−i
) > ui(σi
, σ−i
).
Let σi, σi
∈ Σi. Then σi
is very weakly dominated by σi if
ui(σi, σ−i) ≥ ui(σi
, σ−i) for all σ−i ∈ Σ−i.
A strategy is (strictly, weakly, very weakly) dominant in mixed
strategies if it (strictly, weakly, very weakly) dominates any other
mixed strategy.
Claim 4
Any mixed strategy proﬁle σ ∈ Σ such that each σi is very weakly
dominant in mixed strategies is a mixed Nash equilibrium.
The same claim can be proved in pure strategy setting.
127
Dynamic Games of Complete Information
Extensive-Form Games
Deﬁnition
Sub-Game Perfect Equilibria
128
Dynamic Games of Perfect Information
(Motivation)
Static games (modeled using strategic-form games) cannot capture
games that unfold over time.
In particular, as all players move simultaneously, there is no way how
to model situations in which order of moves is important.
Imagine e.g. chess where players take turns, in every round a player
knows all turns of the opponent before making his own turn.
There are many examples of dynamic games: markets that change
over time, political negotiations, models of computer systems, etc.
We model dynamic games using extensive-form games, a tree like
model that allows to express sequential nature of games.
We start with perfect information games, where each player always
knows results of all previous moves.
Then generalize to imperfect information, where players may have
only partial knowledge of these results (e.g. most card games).
129
Perfect-Info. Extensive-Form Games (Example)
1
h0
2
h1
(3, 1)
K
(1, 3)
U
L
2
h2
(2, 1)
K
(0, 0)
U
R
Here h0, h1, h2 are non-terminal nodes, leaves are terminal nodes.
Each non-terminal node is owned by a player who chooses an action.
E.g. h1 is owned by player 2 who chooses either K or U
Every action results in a transition to a new node.
Choosing L in h0 results in a move to h1
When a play reaches a terminal node, players collect payoffs.
E.g. the left most terminal node gives 3 to player 1 and 1 to player 2.
130
Perfect-Information Extensive-Form Games
A perfect-information extensive-form game is a tuple
G = (N, A, H, Z, χ, ρ, π, h0, u) where
N = {1, . . . , n} is a set of n players, A is a (single) set of actions,
H is a set of non-terminal (choice) nodes, Z is a set of terminal
nodes (assume Z ∩ H = ∅), denote H = H ∪ Z,
χ : H → 2A
{∅} is the action function, which assigns to each
choice node a non-empty set of enabled actions,
ρ : H → N is the player function, which assigns to each
non-terminal node a player i ∈ N who chooses an action there,
we deﬁne Hi := {h ∈ H | ρ(h) = i},
π : H × A → H is the successor function, which maps
a non-terminal node and an action to a new node, such that
h0 is the only node that is not in the image of π (the root)
for all h1, h2 ∈ H and for all a1 ∈ χ(h1) and all a2 ∈ χ(h2),
if π(h1, a1) = π(h2, a2), then h1 = h2 and a1 = a2,
u = (u1, . . . , un), where each ui : Z → R is a payoff function for
player i in the terminal nodes of Z.
131
Some Notation
A path from h ∈ H to h ∈ H is a sequence h1a2h2a3h3 · · · hk−1ak hk
where h1 = h, hk = h and π(hj−1, aj) = hj for every 1 < j ≤ k.
Note that, in particular, h is a path from h to h.
Assumption: For every h ∈ H there is a unique path from h0 to h
and there is no inﬁnite path (i.e., a sequence h1a2h2a3h3 · · · such that
π(hj−1, aj) = hj for every j > 1).
Note that the assumption is satisﬁed when H is ﬁnite.
Indeed, uniqueness follows immediately from the deﬁnition of π. Now let X
be the set of all h from which there is a path to h. If h0 ∈ X we are done.
Otherwise, let h be a node of X with the longest path to h. As h h0, there
is h and a ∈ χ(h ) such that h = π(h , a). But then there is a path from h
to h that is longer than the path from h , a contradiction.
The above claim implies that every perfect-information extensive-form
game can be seen as a game on a rooted tree (H, E, h0) where
H ∪ Z is a set of nodes,
E ⊆ H × H is a set of edges deﬁned by (h, h ) ∈ E iff h ∈ H and
there is a ∈ χ(h) such that π(h, a) = h ,
h0 is the root. 132
Some More Notation
h is a child of h, and h is a parent of h if there is a ∈ χ(h) such
that h = π(h, a).
h ∈ H is reachable from h ∈ H if there is a path from h to h .
If h is reachable from h we say that h is a descendant of h and h is
an ancestor of h (note that, by deﬁnition, h is both a descendant and
an ancestor of itself).
133
Example: Trust Game
1
h0
(5, 5)
z1
D
h1
2
(0, 20)
z2
K
(7.5, 12.5)
z3
S
T
Two players, both start with 5$
Player 1 either distrusts (D) player 2 and keeps the money
(payoffs (5, 5)), or trusts (T) player 2 and passes 5$ to player 2
If player 1 chooses to trust player 2, the money is tripled by the
experimenter and sent to player 2.
Player 2 may either keep (K) the additional 15$ (resulting in
(0, 20)), or share (S) it with player 1 (resulting in (7.5, 12.5)) 134
Example: Trust Game (Cont.)
1
h0
(5, 5)
z1
D
h1
2
(0, 20)
z2
K
(7.5, 12.5)
z3
S
T
N = {1, 2}, A = {D, T, K, S}
H = {h0, h1}, Z = {z1, z2, z3}
χ(h0) = {D, T}, χ(h1) = {K, S}
ρ(h0) = 1, ρ(h1) = 2
π(h0, D) = z1, π(h0, T) = h1, π(h1, K) = z2, π(h1, S) = z3
u1(z1) = 5, u1(z2) = 0, u1(z3) = 7.5, u2(z1) = 5, u2(z2) = 20,
u2(z3) = 12.5
135
Stackelberg Competition
Very similar to Cournot duopoly ...
Two identical ﬁrms, players 1 and 2, produce some good.
Denote by q1 and q2 quantities produced by ﬁrms 1 and 2, resp.
The total quantity of products in the market is q1 + q2.
The price of each item is κ − q1 − q2 where κ > 0 is ﬁxed.
Firms have a common per item production cost c.
Except that ...
As opposed to Cournot duopoly, the ﬁrm 1 moves ﬁrst, and
chooses the quantity q1 ∈ [0, ∞).
Afterwards, the ﬁrm 2 chooses q2 ∈ [0, ∞) (knowing q1) and then
the ﬁrms get their payoffs.
136
Stackelberg Competition – Extensive-Form Model
An extensive-form game model:
N = {1, 2}
A = [0, ∞)
H = {h0, h
q1
1
| q1 ∈ [0, ∞)}
Z = {zq1,q2 | q1, q2 ∈ [0, ∞)
χ(h0) = [0, ∞), χ(h
q1
1
) = [0, ∞)
ρ(h0) = 1, ρ(h
q1
1
) = 2
π(h0, q1) = h
q1
1
, π(h
q1
1
, q2) = zq1,q2
The payoffs are
u1(zq1,q2
) = q1(κ − q1 − q2) − q1c
u2(zq1,q2
) = q2(κ − q1 − q2) − q2c
137
Example: Chess (a bit simpliﬁed)
There are inﬁnitely many representations of chess, this one is different from
the one presented at the lecture.
N = {1, 2}
Denoting Boards the set of all (appropriately encoded) board
positions, we deﬁne H = B × {1, 2} where
B = {w ∈ Boards+
| no board repeats ≥ 3 times in w}
(Here Boards+
is the set of all non-empty sequences of boards)
Z consists of all nodes (wb, i) (here b ∈ Boards) where either b
is checkmate for player i, or i does not have a move in b, or
every move of i in b leads to a board with two occurrences in w
χ(wb, i) is the set of all legal moves of player i in b
ρ(wb, i) = i
π is deﬁned by π((wb, i), a) = (wbb , 2 − i + 1) where b is
obtained from b according to the move a
h0 = (b0, 1) where b0 is the initial board
uj(wb, i) ∈ {1, 0, −1}, here 1 means "win", 0 means "draw", and
−1 means "loss" for player j 138
Pure Strategies
Let G = (N, A, H, Z, χ, ρ, π, h0, u) be a perfect-information
extensive-form game.
Deﬁnition 51
A pure strategy of player i in G is a function si : Hi → A such
that for every h ∈ Hi we have that si(h) ∈ χ(h).
We denote by Si the set of all pure strategies of player i in G.
Denote by S = S1 × · · · × Sn the set of all pure strategy proﬁles.
Note that each pure strategy proﬁle s ∈ S determines a unique
path ws = h0a1h1 · · · hk−1ak hk from h0 to a terminal node hk by
aj = sρ(hj−1)(hj−1) ∀0 < j ≤ k
Denote by O(s) the terminal node reached by ws.
Abusing notation a bit, we denote by ui(s) the value ui(O(s)) of
the payoff for player i when the terminal node O(s) is reached
using strategies of s.
139
Example: Trust Game
1
h0
(5, 5)
z1
D
h1
2
(0, 20)
z2
K
(7.5, 12.5)
z3
S
T
A pure strategy proﬁle (s1, s2) where
s1(h0) = T and s2(h1) = K
is usually written as TK (BFS & left to right traversal) determines the
path h0T h1K z2
The resulting payoffs: u1(s1, s2) = 0 and u2(s1, s2) = 20.
140
Extensive-Form vs Strategic-Form
The extensive-form game G determines the corresponding
strategic-form game ¯G = (N, (Si)i∈N , (ui)i∈N)
Here note that the set of players N and the sets of pure strategies Si are the
same in G and in the corresponding game.
The payoff functions ui in ¯G are understood as functions on the pure strategy
proﬁles of S = S1 × · · · × Sn.
With this deﬁnition, we may apply all solution concepts and algorithms
developed for strategic-form games to the extensive form games.
We often consider the extensive-form to be only a different way of
representing the corresponding strategic-form game and do not distuinguish
between them.
There are some issues, namely whether all notions from
strategic-form area make sense in the extensive-form. Also, naive
application of algorithms may result in unnecessarily high complexity.
For now, let us consider pure strategies only!
141
Example: Trust Game
1
h0
(5, 5)
z1
D
h1
2
(0, 20)
z2
K
(7.5, 12.5)
z3
S
T
Is any strategy strictly (weakly, very weakly) dominant?
Is any strategy never best response?
Is there a Nash equilibrium in pure strategies ?
142
Example
1
h0
2
h1
(3, 1)
K
(1, 3)
U
L
2
h2
(2, 1)
K
(0, 0)
U
R
Find all pure strategies of both players.
Is any strategy (strictly, weakly, very weakly) dominant?
Is any strategy (strictly, weakly, very weakly) dominated?
Is any strategy never best response?
Are there Nash equilibria in pure strategies ?
143
Example
1
h0
2
h1
(3, 1)
K
(1, 3)
U
L
2
h2
(2, 1)
K
(0, 0)
U
R
KK KU UK UU
L 3, 1 3, 1 1, 3 1, 3
R 2, 1 0, 0 2, 1 0, 0
Find all pure strategies of both players.
Is any strategy (strictly, weakly, very weakly) dominant?
Is any strategy (strictly, weakly, very weakly) dominated?
Is any strategy never best response?
Are there Nash equilibria in pure strategies ?
144
Criticism of Nash Equilibria
1
h0
2
h1
(3, 1)
K
(1, 3)
U
L
2
h2
(2, 1)
K
(0, 0)
U
R
KK KU UK UU
L 3, 1 3, 1 1, 3 1, 3
R 2, 1 0, 0 2, 1 0, 0
Two Nash equilibria in pure strategies: (L, UU ) and (R, UK )
Examine (L, UU ):
Player 2 threats to play U in h2,
as a result, player 1 plays L,
player 2 reacts to L by playing the best response, i.e., U.
However, the threat is not credible, once a play reaches h2, a rational
player 2 chooses K . 145
Criticism of Nash Equilibria
1
h0
2
h1
(3, 1)
K
(1, 3)
U
L
2
h2
(2, 1)
K
(0, 0)
U
R
KK KU UK UU
L 3, 1 3, 1 1, 3 1, 3
R 2, 1 0, 0 2, 1 0, 0
Two Nash equilibria in pure strategies: (L, UU ) and (R, UK )
Examine (R, UK ): This equilibrium is sensible in the following sense:
Player 2 plays the best response in both h1 and h2
Player 1 plays the "best response" in h0 assuming that player 2
will play his best responses in the future.
This equilibrium is called subgame perfect.
146
Subgame Perfect Equilibria
Given h ∈ H, we denote by Hh
the set of all nodes reachable from h.
Deﬁnition 52 (Subgame)
A subgame Gh
of G rooted in h ∈ H is the restriction of G to nodes
reachable from h in the game tree. More precisely,
Gh
= (N, A, Hh
, Zh
, χh
, ρh
, πh
, h, uh
) where Hh
= H ∩ Hh
,
Zh
= Z ∩ Hh
, χh
and ρh
are restrictions of χ and ρ to Hh
, resp.,
(Given a function f : A → B and C ⊆ A, a restriction of f to C is a function
g : C → B such that g(x) = f(x) for all x ∈ C.)
πh
is deﬁned for h ∈ Hh
and a ∈ χh
(h ) by πh
(h , a) = π(h , a)
each uh
i
is a restriction of ui to Zh
Deﬁnition 53
A subgame perfect equilibrium (SPE) in pure strategies is a pure
strategy proﬁle s ∈ S such that for any subgame Gh
of G,
the restriction of s to Hh
is a Nash equilibrium in pure strategies in Gh
.
A restriction of s = (s1, . . . , sn) ∈ S to Hh
is a strategy proﬁle sh
= (sh
1
, . . . , sh
n )
where sh
i
(h ) = si(h ) for all i ∈ N and all h ∈ Hi ∩ Hh
.
147
Stackelberg Competition – SPE
N = {1, 2}, A = [0, ∞)
H = {h0, hq1
1
| q1 ∈ [0, ∞)}, Z = {zq1,q2
| q1, q2 ∈ [0, ∞)
χ(h0) = [0, ∞), χ(hq1
1
) = [0, ∞), ρ(h0) = 1, ρ(hq1
1
) = 2
π(h0, q1) = hq1
1
, π(hq1
1
, q2) = zq1,q2
The payoffs are u1(zq1,q2
) = q1(κ − c − q1 − q2),
u2(zq1,q2
) = q2(κ − c − q1 − q2)
Denote θ = κ − c
Player 1 chooses q1, we know that the best response of player 2 is
q2 = (θ − q1)/2 where θ = κ − c.
Then u1(zq1,q2
) = q1(θ − q1 − θ/2 − q1/2) = (θ/2)q1 − q2
1
/2 which is
maximized by q1 = θ/2, giving q2 = θ/4.
Then u1(zq1,q2
) = θ2
/8 and u2(zq1,q2
) = θ2
/16.
Note that ﬁrm 1 has an advantage as a leader.
148
Existence of SPE
From this moment on we consider only ﬁnite games!
Theorem 54
Every ﬁnite perfect-information extensive-form game has a SPE in
pure strategies.
Proof: By induction on the number of nodes.
Base case: If |H| = 1, the only node is terminal, and the trivial pure
strategy proﬁle is SPE.
Induction step: Consider a game with more than one node. Let
K = {h1, . . . , hk } be the set of all children of the root h0.
By induction, for every h there is a SPE sh
in Gh
.
For every i ∈ N, deﬁne a strategy si of player i in G as follows:
for i = ρ(h0) we have si(h0) ∈ argmaxh ∈K uh
i
(sh
)
for all i ∈ N and h ∈ H we have si(h) = sh
i
(h) where h ∈ Hh
∩ Hi
We claim that s = (s1, . . . , sn) is a SPE in pure strategies.
By deﬁnition, s is NE in all subgames except (possibly) the G itself.
149
Existence of SPE (Cont.)
Let h = sρ(h0)(h0).
Consider a possible deviation of player i.
Let ¯s be another pure strategy proﬁle in G obtained from
s = (s1, . . . , sn) by changing si.
First, assume that i ρ(h0). Then
ui(s) = uh
i
(sh
) ≥ uh
i
(¯sh
) = ui(¯s)
Here the ﬁrst equality follows from h = sρ(h0)(h0) and that s behaves similarly
as sh
in Gh
, the inequality follows from the fact that sh
is a NE in Gh
, and
the second equality follows from h = sρ(h0)(h0) = ¯sρ(h0)(h0).
Second, assume that i = ρ(h0).
Let hr = ¯si(h0) = ¯sρ(h0)(h0).
Then uh
i
(sh
) ≥ uhr
i
(shr
) because h maximizes the payoff of
player i = ρ(h0) in the children of h0.
But then
ui(s) = uh
i
(sh
) ≥ uhr
i
(shr
) ≥ uhr
i
(¯shr
) = ui(¯s)
150
Chess
Recall that in the model of chess, the payoffs were from
{1, 0, −1} and u1 = −u2 (i.e. it is zero-sum).
By Theorem 54, there is a SPE in pure strategies (s∗
1
, s∗
2
).
However, then one of the following holds:
1. White has a winning strategy
If u1(s∗
1
, s∗
2
) = 1 and thus u2(s∗
1
, s∗
2
) = −1
2. Black has a winning strategy
If u1(s∗
1
, s∗
2
) = −1 and thus u2(s∗
1
, s∗
2
) = 1
3. Both players have strategies to force a draw
If u1(s∗
1
, s∗
2
) = 0 and thus u2(s∗
1
, s∗
2
) = 0
Question: Which one is the right answer?
Answer: Nobody knows yet ... the tree is too big!
Even with ∼ 200 depth & ∼ 5 moves per node: 5200 nodes!
151
Backward Induction
The proof of Theorem 54 gives an efﬁcient procedure for computing
SPE for ﬁnite perfect-information extensive-form games.
Backward Induction: We inductively "attach" to every node h a SPE
sh
in Gh
, together with a vector of expected payoffs
u(h) = (u1(h), . . . , un(h)).
Initially: Attach to each terminal node z ∈ Z the empty proﬁle
sz
= (∅, . . . , ∅) and the payoff vector u(z) = (u1(z), . . . , un(z)).
While(there is an unattached node h with all children attached):
1. Let K be the set of all children of h
2. Let
hmax ∈ argmax
h ∈K
uρ(h)(h )
3. Attach to h a SPE sh
where
sh
ρ(h)
(h) = hmax
for all i ∈ N and all h ∈ Hi deﬁne sh
i
(h ) = s
¯h
i
(h ) where
h ∈ H
¯h
∩ Hi (in G
¯h
, each sh
i
behaves as s
¯h
i
i.e. sh
¯h
= s
¯h
)
4. Attach to h the expected payoffs ui(h) = ui(hmax) for i ∈ N.
152
Efﬁcient Algorithms for Pure Nash Equilibria
In the step 2. of the backward induction, the algorithm may choose
an arbitrary hmax ∈ argmaxh ∈K uρ(h)(h ) and always obtain a SPE.
In order to compute all SPE, the algorithm may systematically search
through all possible choices of hmax throughout the induction.
Backward induction is too inefﬁcient (unnecessarily searches through
the whole tree).
There are better algorithms, such as α−β-prunning.
For details, extensions etc. see e.g.
PB016 Artiﬁcial Intelligence I
Multi-player alpha-beta prunning, R. Korf, Artiﬁcial Intelligence
48, pages 99-111, 1991
Artiﬁcial Intelligence: A Modern Approach (3rd edition),
S. Russell and P. Norvig, Prentice Hall, 2009
153
Example
Centipede game:
A A A A A
D D D D D
(1, 0) (0, 2) (3, 1) (2, 4) (4, 3)
(3, 5)1 2 1 2 1
SPE in pure strategies: (DDD, DD) ... Isn’t it weird?
There are serious issues here ...
In laboratory setting, people usually play A for several steps.
There is a theoretical problem: Imagine, that you are player 2.
What would you do when player 1 chooses A in the ﬁrst step?
The SPE analysis says that you should go down, but the same
analysis also says that the situation you are in cannot appear :-)
154
Dynamic Games of Complete Information
Extensive-Form Games
Imperfect-Information Games
155
Extensive-form of Matching Pennies
Is it possible to model Matching pennies using extensive-form
games?
H T
H 1, −1 −1, 1
T −1, 1 1, −1
1
h0
2
h1
(1, −1)
H
(−1, 1)
T
H
2
h2
(−1, 1)
H
(1, −1)
T
T
The problem is that player 2 is "perfectly" informed about the choice
of player 1. In particular, there are pure Nash equilibria (H, TH) and
(T, TH) in the extensive-form game as opposed to the strategic-form.
Reversing the order of players does not help.
We need to extend the formalism to be able to hide some information
about previous moves. 156
Extensive-form of Matching Pennies
Matching pennies can be modeled using
an imperfect-information extensive-form game:
1
h0
2
h1
(1, −1)
H
(−1, 1)
T
H
2
h2
(−1, 1)
H
(1, −1)
T
T
Here h1 and h2 belong to the same information set of player 2.
As a result, player 2 is not able to distinguish between h1 and h2.
So even though players do not move simultaneously, the information
player 2 has about the current situation is the same as in
the simultaneous case.
157
Imperfect Information Games
An imperfect-information extensive-form game is a tuple
Gimp = (Gperf , I) where
Gperf = (N, A, H, Z, χ, ρ, π, h0, u) is a perfect-information
extensive-form game (called the underlying game),
I = (I1, . . . , In) where for each i ∈ N = {1, . . . , n}
Ii = {Ii,1, . . . , Ii,ki
}
is a collection of information sets for player i that satisﬁes
ki
j=1
Ii,j = Hi and Ii,j ∩ Ii,k = ∅ for j k
(i.e., Ii is a partition of Hi)
for all h, h ∈ Ii,j, we have ρ(h) = ρ(h ) and χ(h) = χ(h )
(i.e., nodes from the same information set are owned by the same
player and have the same sets of enabled actions)
Given h ∈ H, we denote by I(h) the information set Ii,j containing h.
Given an information set Ii,j, we denote by χ(Ii,j) the set of all actions
enabled in some (and hence all) nodes of Ii,j.
158
Imperfect Information Games – Strategies
Now we deﬁne the set of pure, mixed, and behavioral strategies in Gimp as
subsets of pure, mixed, and behavioral strategies, resp., in Gperf that respect
the information sets.
Let Gimp = (Gperf , I) be an imperfect-information extensive-form game
where Gperf = (N, A, H, Z, χ, ρ, π, h0, u).
Deﬁnition 55
A pure strategy of player i in Gimp is a pure strategy si in Gperf such
that for all j = 1, . . . , ki and all h, h ∈ Ii,j holds si(h) = si(h ).
Note that each si can also be seen as a function si : Ii → A such that for
every Ii,j ∈ Ii we have that si(Ii,j) ∈ χ(Ii,j).
As before, we denote by Si the set of all pure strategies of player i in
Gimp, and by S = S1 × · · · × Sn the set of all pure strategy proﬁles.
As in the perfect-information case we have a corresponding
strategic-form game ¯Gimp = (N, (Si)i∈N , (ui)i∈N).
159
Matching Pennies
1
h0
2
h1
(1, −1)
H
(−1, 1)
T
H
2
h2
(−1, 1)
H
(1, −1)
T
T
I1 = {I1,1} where I1,1 = {h0}
I1 = {I2,1} where I2,1 = {h1, h2}
Example of pure strategies:
s1(I1,1) = H which describes the strategy s1(h0) = H
s2(I2,1) = T which describes the strategy s2(h1) = s2(h2) = T
(it is also sufﬁcient to specify s2(h1) = T since then s2(h2) = T)
So we really have strategies H, T for player 1 and H, T for player 2.
160
Weird Example
1
h0
2
h1
(1, 2)
K
(2, 1)
L
A
2
h2
(3, 5)
K
(7, 1)
L
B
1
h3
(2, 5)
A
(11, 0)
B
(−4, 10)
C
C
Note that I1 = {I1,1} where I1,1 = {h0, h3}
and that I2 = {I2,1} where I2,1 = {h1, h2}
What pure strategies are in this example?
161
SPE with Imperfect Information
1
h0
2
h1
h3
1
z1
(4, 1)
C
z2
(1, 4)
¯C
B
1
h4
z3
(1, 4)
C
z4
(4, 1)
¯C
¯B
A
2
h2
z5
(1, 1)
D
z6
(4, 5)
¯D
¯A
What we designate as subgames to allow the backward induction?
Only subtrees rooted in h1, h2, and h0 (together with all subtrees
rooted in terminal nodes)
Note that subtrees rooted in h3 and h4 cannot be considered as
"independent" subgames because their individual solutions cannot be
combined to a single best response in the information set {h3, h4}. 162
SPE with Imperfect Information
Let Gimp = (Gperf , I) be an imperfect-information extensive-form game
where Gperf = (N, A, H, Z, χ, ρ, π, h0, u) is the underlying
perfect-information extensive-form game.
Let us denote by Hproper the set of all h ∈ H that satisfy the following:
For every h reachable from h, we have that either all nodes of I(h )
are reachable from h, or no node of I(h ) is reachable from h.
Intuitively, h ∈ Hproper iff every information set Ii,j is either completely contained
in the subtree rooted in h, or no node of Ii,j is contained in the subtree.
Deﬁnition 56
For every h ∈ Hproper we deﬁne a subgame Gh
imp
to be the imperfect
information game (Gh
perf
, Ih
) where Ih
is the restriction of I to Hh
.
Note that as subgames of Gimp we consider only subgames of Gperf that
respect the information sets, i.e., are rooted in nodes of Hproper .
Deﬁnition 57
A strategy proﬁle s ∈ S is a subgame perfect equilibrium (SPE) if sh
is
a Nash equilibrium in every subgame Gh
imp
of Gimp (here h ∈ Hproper ).
163
Backward Induction with Imperfect Info
The backward induction generalizes to imperfect-information
extensive-form games along the following lines:
1. As in the perfect-information case, the goal is to label each node
h ∈ Hproper ∪ Z with a SPE sh
and a vector of payoffs
u(h) = (u1(h), . . . , un(h)) for individual players according to sh
.
2. Starting with terminal nodes, the labeling proceeds bottom up.
Terminal nodes are labeled similarly as in the perfect-inf. case.
3. Consider h ∈ Hproper , let K be the set of all h ∈ Hproper ∪ Z {h}
that are h’s closest descendants out of Hproper ∪ Z.
I.e., h ∈ K iff h h is reachable from h and the unique path from h to
h visits only nodes of H Hproper (except the ﬁrst and the last node).
For every h ∈ K we have already computed a SPE sh
in Gh
imp
and the vector of corresponding payoffs u(h ).
4. Now consider all nodes of K as terminal nodes where each
h ∈ K has payoffs u(h ). This gives a new game in which we
compute an equilibrium ¯sh
together with the vector u(h).
The equilibrium sh
is then obtained by "concatenating" ¯sh
with
all sh
, here h ∈ K, in the subgames Gh
imp
of Gh
imp
. 164
Mutually Assured Destruction
Analysis of Cuban missile crisis of 1962
(as described in Games for Business and
Economics by R. Gardner)
The crisis started with United States’ discovery of Soviet nuclear
missiles in Cuba.
The USSR then backed down, agreeing to remove the missiles
from Cuba, which suggests that US had a credible threat "if you
don’t back off we both pay dearly".
Question: Could this indeed be a credible threat?
165
Mutually Assured Destruction (Cont.)
Model as an extensive-form game:
First, player 1 (US) chooses to either ignore the incident (I),
resulting in maintenance of status quo (payoffs (0, 0)), or
escalate the situation (E).
Following escalation by player 1, player 2 can back down (B),
causing it to lose face (payoffs (10, −10)), or it can choose to
proceed to a nuclear confrontation (N).
Upon this choice, the players play a simultaneous-move game in
which they can either retreat (R), or choose doomsday (D).
If both retreat, the payoffs are (−5, −5), a small loss due to
a mobilization process.
If either of them chooses doomsday, then the world
destructs and payoffs are (−100, −100).
Find SPE in pure strategies.
166
Mutually Assured Destruction (Cont.)
1
h0
2
h1
h2
1
h3
2
(−5, −5)
z1
R
(−100, −100)
z2
D
R
h4
2
(−100, −100)
z3
R
(−100, −100)
z4
D
D
N
(10, −10)
z5
B
E
(0, 0)
z6
I
Solve G
h2
imp
(a strategic-form game). Then G
h1
imp
by solving a game rooted in h1
with terminal nodes h2, z5 (payoffs in h2 correspond to an equilibrium in G
h2
imp
).
Finally solve Gimp by solving a game rooted in h0 with terminal nodes h1, z6
(payoffs in h1 have been computed in the previous step). 167
Dynamic Games of Complete Information
Repeated Games
Finitely Repeated Games
168
Example
C S
C −5, −5 0, −20
S −20, 0 −1, −1
Imagine that the criminals are being arrested repeatedly.
Can they somewhat reﬂect upon their experience in order to play
"better"?
In what follows we consider strategic-form games played repeatedly
for ﬁnitely many rounds, the ﬁnal payoff of each player will be
the average of payoffs from all rounds
inﬁnitely many rounds, here we consider a discounted sum of
payoffs and the long-run average payoff
We analyze Nash equilibria and sub-game perfect equilibria.
We stick to pure strategies only!
169
Finitely Repeated Games
Let G = ({1, 2}, (S1, S2) , (u1, u2)) be a ﬁnite strategic-form game of
two players.
A T-stage game GT-rep based on G proceeds in T stages so that in
a stage t ≥ 1, players choose a strategy proﬁle st
= (st
1
, st
2
).
After T stages, both players collect the average payoff T
t=1 ui(st
) / T.
A history of length 0 ≤ t ≤ T is a sequence h = s1
· · · st
∈ St
of t
strategy proﬁles. Denote by H(t) the set of all histories of length t.
A pure strategy for player i in a T-stage game GT-rep is a function
τi :
T−1
t=0
H(t) → Si
which for every possible history chooses a next step for player i.
Every strategy proﬁle τ = (τ1, τ2) in GT-rep induces a sequence of
pure strategy proﬁles wτ = s1
· · · sT
in G so that st
i
= τi(s1
· · · st−1
).
Given a pure strategy proﬁle τ in GT-rep such that wτ = s1
· · · sT
,
deﬁne the payoffs ui(τ) = T
t=1 ui(st
) / T. 170
Example
C S
C −5, −5 0, −20
S −20, 0 −1, −1
Consider a 3-stage game.
Examples of histories: , (C, S), (C, S)(S, S), (C, S)(S, S)(C, C)
Here the last one is terminal, obtained using τ1, τ2 s.t.:
τ1( ) = C, τ1((C, S)) = S, τ1((C, S)(S, S)) = C
τ2( ) = S, τ2((C, S)) = S, τ2((C, S)(S, S)) = C
Thus w(τ1,τ2) = (C, S)(S, S)(C, C)
u1(τ1, τ2) = (0 + (−1) + (−5))/3 = −2
u2(τ1, τ2) = (−20 + (−1) + (−5))/3 = −26/3
171
Finitely Repeated Games in Extensive-Form
Every T-stage game GT-rep can be deﬁned as an imperfect
information extensive-form game.
Deﬁne an imperfect-information extensive-form game Grep
imp
= (Grep
perf
, I)
such that Grep
perf
= ({1, 2}, A, H, Z, χ, ρ, π, h0, u) where
A = S1 ∪ S2
H = (S1 × S2)≤T
∪ (S1 × S2)<T
· S1
Intuitively, elements of (S1 × S2)≤k
are possible histories;
(S1 × S2)<k
· S1 is used to simulate a simultaneous play of G by letting
player 1 choose ﬁrst and player 2 second.
Z = (S1 × S2)T
χ( ) = S1 and χ(h · s1) = S2 for s1 ∈ S1, and χ(h · (s1, s2)) = S1
for (s1, s2) ∈ S
ρ( ) = 1 and ρ(h · s1) = 2 and ρ(h · (s1, s2)) = 1
π( , s1) = s1 and π(h · s1, s2) = h · (s1, s2) and
π(h · (s1, s2), s1
) = h · (s1, s2) · s1
h0 = and ui((s1
1
, s1
2
)(s2
1
, s2
2
) · · · (sT
1
, sT
2
)) = T
t=1 ui(st
1
, st
2
) / T
172
Finitely Repeated Games in Extensive-Form
The set of information sets is deﬁned as follows: Let h ∈ H1 be a node
of player 1, then
there is exactly one information set of player 1 containing h as
the only element,
there is exactly one information set of player 2 containing all
nodes of the form h · s1 where s1 ∈ S1.
Intuitively, in every round, player 1 has a complete information about
results of past plays,
player 1 chooses a pure strategy s1 ∈ S1,
player 2 is not informed about s1 but still has a complete information
about results of all previous rounds,
player 2 chooses a pure strategy s2 ∈ S2 and both players are
informed about the result.
173
Finitely Repeated Games – Equilibria
Deﬁnition 58
A strategy proﬁle τ = (τ1, τ2) in a T-stage game GT-rep is a Nash
equilibrium if for every i ∈ {1, 2} and every τi
we have
ui(τ1, τ2) ≥ ui(τi , τ−i)
To deﬁne SPE we use the following notation. Given a history
h = s1
· · · st
and a strategy τi of player i, we deﬁne a strategy τh
i
in
(T − t)-stage game based on G by
τh
i (¯s1
· · · ¯s
¯t
) = τi(s1
· · · st ¯s1
· · · ¯s
¯t
) for every sequence ¯s1
· · · ¯s
¯t
(i.e. τh
i
behaves as τi after h)
Deﬁnition 59
A strategy proﬁle τ = (τ1, τ2) in a T-stage game GT-rep is
a subgame-perfect Nash equilibrium (SPE) if for every history h
the proﬁle (τh
1
, τh
2
) is a Nash equilibrium in the (T − |h|)-stage game
based on G.
174
SPE with Single NE in G
C S
C −5, −5 0, −20
S −20, 0 −1, −1
Consider a T-stage game based on Prisoner’s dilemma.
For every T, ﬁnd a SPE.
... there is one, play (C, C) all the time. Is it all?
Theorem 60
Let G be an arbitrary ﬁnite strategic-form game. If G has a unique
Nash equilibrium, then playing this equilibrium all the time is
the unique SPE in the T-stage game based on G.
Proof.
By backward induction, players have to play the NE in the last stage.
As the behavior in the last stage does not depend on the behavior in
the (T − 1)-th stage, they have to play the NE also in the (T − 1)-th
stage. Then the same holds in the (T − 2)-th stage, etc.
175
Further Discussion of Prisoner’s Dilemma
C S
C −5, −5 0, −20
S −20, 0 −1, −1
Are there other NE (that are not SPE) in the repeated Prisoner’s
dilemma?
To simplify our discussion, we use the following notation: X−YZ,
where X, Y, Z ∈ {C, S} denotes the following strategy:
In the ﬁrst phase, play X
In the second phase, play Y if the opponent plays C in the ﬁrst
phase, otherwise play Z
There are 4 NE: They are the four proﬁles that lead to (C, C)(C, C),
i.e., each player plays either C−CC, or C−CS.
176
Further Discussion of Prisoner’s Dilemma
C S
C −5, −5 0, −20
S −20, 0 −1, −1
The strategy C strictly dominates S in the Prisoner’s dilemma.
Is there a strictly dominant strategy in the 2-stage game based on
the Prisoner’s dilemma?
If player 2 plays S−CS, then the best responses of player 1 are
S−CC and S−SC.
(The strategy S−CS is usually called "tit-for-tat".)
If player 2 plays S−SC, then the best responses are C−SC and
C−CC.
So there is no strictly dominant strategy for player 1.
(Which would be among the best responses for all strategies of player 2.)
177
SPE with Multiple NE in G
Let s = (s1, s2) be a Nash equilibrium in G.
Deﬁne a strategy proﬁle τ = (τ1, τ2) in GT-rep where
τ1 chooses s1 in every stage
τ2 chooses s2 in every stage
Proposition 3
τ is a SPE in GT-rep for every T ≥ 1.
Proof.
Apparently, changing τi in some stage(s) may only result in the same
or worse payoff for player i, since the other player always plays s2
independent of the choices of player 1.
The proposition may be generalized by allowing players to play
different equilibria in particular stages
I.e., consider a sequence of NE s1
, s2
, . . . , sT
in G and assume that in stage
player i plays si
Does this cover all possible SPE in ﬁnitely repeated games?
178
SPE with Multiple NE in G
m f r
M 4, 4 −1, 5 0, 0
F 5, −1 1, 1 0, 0
R 0, 0 0, 0 3, 3
NE in the above game G : (F, f) and (R, r)
Consider 2-stage game G2-rep and strategies τ1, τ2 where
τ1 : Chooses M in stage 1. In stage 2 plays R if (M, m) was
played in the ﬁrst stage, and plays F otherwise.
τ2 : Chooses m in stage 1. In stage 2 plays r if (M, m) was
played in the ﬁrst stage, and plays f otherwise.
Is this SPE?
Note that here the players do not play a NE in the ﬁrst step.
The idea is that both players agree to play a Pareto optimal proﬁle. If
both comply, then a favorable NE is played in the second stage. If one
of them betrays then a "punishing" NE is played.
179
Dynamic Games of Complete Information
Repeated Games
Inﬁnitely Repeated Games
180
Inﬁnitely Repeated Games
Let G = ({1, 2}, (S1, S2) , (u1, u2)) be a strategic-form game of two
players.
An inﬁnitely repeated game Girep based on G proceeds in stages so
that in each stage, say t, players choose a strategy proﬁle
st
= (st
1
, st
2
).
Recall that a history of length t ≥ 0 is a sequence h = s1
· · · st
∈ St
of
t strategy proﬁles. Denote by H(t) the set of all histories of length t.
A pure strategy for player i in the inﬁnitely repeated game Girep is
a function
τi :
∞
t=0
H(t) → Si
which for every possible history chooses a next step for player i.
Every pure strategy proﬁle τ = (τ1, τ2) in Girep induces a sequence of
pure strategy proﬁles wτ = s1
s2
· · · in G so that st
i
= τi(s1
· · · st−1
).
(Here for t = 0 we have that s1
· · · st−1
= .)
181
Inﬁnitely Repeated Games & Discounted Payoff
Let τ = (τ1, τ2) be a pure strategy proﬁle in Girep such that
wτ = s1
s2
· · ·
Given 0 < δ < 1, we deﬁne a δ-discounted payoff by
uδ
i (τ) = (1 − δ)
∞
t=0
δt
· ui(st+1
)
Given a strategic-form game G and 0 < δ < 1, we denote by Gδ
irep
the
inﬁnitely repeated game based on G together with the δ-discounted
payoffs.
182
Inﬁnitely Repeated Games & Discounted Payoff
Deﬁnition 61
A strategy proﬁle τ = (τ1, τ2) is a Nash equilibrium in Gδ
irep
if for both
i ∈ {1, 2} and for every τi
we have that
uδ
i (τi, τ−i) ≥ uδ
i (τi , τ−i)
Given a history h = s1
· · · st
and a strategy τi of player i, we deﬁne
a strategy τh
i
in the inﬁnitely repeated game Girep by
τh
i (¯s1
· · · ¯s
¯t
) = τi(s1
· · · st ¯s1
· · · ¯s
¯t
) for every sequence ¯s1
· · · ¯s
¯t
(i.e. τh
i
behaves as τi after h)
Now τ = (τ1, τ2) is a SPE in Gδ
irep
if for every history h we have that
(τh
1
, τh
2
) is a Nash equilibrium.
Note that (τh
1
, τh
2
) must be a NE also for all histories h that are not visited
when the proﬁle (τ1, τ2) is used.
183
Example
Consider the inﬁnitely repeated game Girep based on Prisoner’s
dilemma:
C S
C −5, −5 0, −20
S −20, 0 −1, −1
What are the Nash equilibria and SPE in Gδ
irep
for a given δ ?
Consider a pure strategy proﬁle (τ1, τ2) where τi(s1
· · · sT
) = C for all
T ≥ 1 and i ∈ {1, 2}. Is it a NE? A SPE?
Consider a "grim trigger" proﬁle (τ1, τ2) where
τi(s1
· · · sT
) =



S T = 0
S s = (S, S) for all 1 ≤ ≤ T
C otherwise
Is it a NE? Is it a SPE?
184
One-Shot Deviation Principle
A pure strategy proﬁle τ = (τ1, τ2) in Girep satisﬁes one-shot deviation
property in Gδ
irep
if for every i ∈ {1, 2} and every ¯τi, differing from τi just
on a single history h, we have uδ
i
(¯τh
1
, τh
2
) ≤ uδ
i
(τh
1
, τh
2
).
Theorem 62
Let G = ({1, 2}, (S1, S2), (u1, u2)) be a two-player strategic-form game
such that both u1 and u2 are bounded on S = S1 × S2. Let 0 < δ < 1.
A pure strategy proﬁle τ = (τ1, τ2) in Girep is a SPE in Gδ
irep
iff
it satisﬁes the one-shot deviation property in Gδ
irep
.
Before proving Theorem 62, let us note the following:
The one shot deviation property is concerned with all strategies
¯τi that differ from τi on a single history. This means that we have
to consider all histories h, even those that can not be visited
using τi with any opponent.
The one-shot deviation property immediately implies
the following: If ¯τi does not differ from τi on any history of
the form h = hh where h ε (i.e., on any history obtained by
prolonging h), then uδ
i
(¯τh
1
, τh
2
) ≤ uδ
i
(τh
1
, τh
2
).
Indeed, note that τh
i
differs from ¯τh
i
only on h.
185
One-Shot Deviation Principle
Proof. ⇒: Trivial.
⇐: Assume that τ satisﬁes the one-shot deviation property but is not
a SPE. That is, a deviation may increase payoff of one of the players
in a subgame. Assume, w.l.o.g., that player 1 gains by deviation to
a strategy ¯τ1 in a subgame starting with a h, i.e.,
uδ
1(¯τh
1, τh
2) > uδ
1(τh
1, τh
2) (2)
Since δ < 1 and ui are bounded on S, we may safely choose ¯τ1 so
that ¯τ1(h ) = τ1(h ) for all sufﬁciently long histories h .
Indeed, since ui is bounded on pure strategies of G, the sum ∞
t= δt
· ui(st+1
)
goes to 0 as goes to ∞; hence the strict inequality (2) remains valid even if
¯τ1 is arbitrarily modiﬁed in a very distant future.
186
One-Shot Deviation Principle
Let h be a history of maximum length such that h is a preﬁx of h and
¯τ1(h ) τ1(h ). (Note that then ¯τ1(h h ) = τ1(h h ) for all h ε.)
Let ¯τ11 be a strategy of player 1 obtained from ¯τ1 by changing ¯τ1(h )
to τ1(h ). Now note that the one-shot deviation property implies, that
uδ
1(¯τh
11, τh
2 ) = uδ
1(τh
1 , τh
2 ) ≥ uδ
1(¯τh
1 , τh
2 )
and thus uδ
1
(¯τh
11
, τh
2
) ≥ uδ
1
(¯τh
1
, τh
2
) > uδ
1
(τh
1
, τh
2
). Note that ¯τh
11
has
a strictly smaller number of deviations from τh
1
than ¯τh
1
.
Repeating the same argument with ¯τ11 in place of ¯τ1 we obtain ¯τ12
such that uδ
1
(¯τh
12
, τh
2
) ≥ uδ
1
(¯τh
11
, τh
2
) > uδ
1
(τh
1
, τh
2
). Here ¯τh
12
has even less
deviations from τh
1
than ¯τh
11
.
Then repeating with ¯τ12 in place of ¯τ1 we obtain ¯τ13 such that
uδ
1
(¯τh
13
, τh
2
) ≥ uδ
1
(¯τh
12
, τh
2
) > uδ
1
(τh
1
, τh
2
), etc., still decreasing the number
of deviations from τh
1
.
Eventually, as ¯τh
1
has only ﬁnitely many deviations from τh
1
, we get
¯τh
1k
= τh
1
for some k and thus uδ
1
(τh
1
, τh
2
) = uδ
1
(¯τh
1k
, τh
2
) > uδ
1
(τh
1
, τh
2
),
a contradiction.
187
Example
Consider the inﬁnitely repeated game based on Prisoner’s dilemma:
C S
C −5, −5 0, −20
S −20, 0 −1, −1
The grim trigger proﬁle (τ1, τ2) where
τi(s1
· · · sT
) =



S T = 0
S s = (S, S) for all 1 ≤ ≤ T
C otherwise
is a SPE.
188
A Simple Version of Folk Theorem
Let G = ({1, 2}, (S1, S2) , (u1, u2)) be a two-player strategic-form game
where u1, u2 are bounded on S = S1 × S2 (but S may be inﬁnite) and
let s∗
be a Nash equilibrium in G.
Let s be a strategy proﬁle in G satisfying ui(s) > ui(s∗
) for all i ∈ N.
Consider the following grim trigger for s using s∗
strategy proﬁle
τ = (τ1, τ2) in Girep where
τi(s1
· · · sT
) =



si T = 0
si s = s for all 1 ≤ ≤ T
s∗
i
otherwise
Then for
δ ≥ max
i∈{1,2}
maxsi
∈Si
ui(si
, s−i) − ui(s)
maxsi
∈Si
ui(si
, s−i) − ui(s∗)
we have that (τ1, τ2) is a SPE in Gδ
irep
and uδ
i
(τ) = ui(s).
Proof: Consider a possible one-shot deviation ¯τ1 of player 1, i.e.,
there is exactly one h such that ¯τ1(h) τ1(h). We distinguish two
cases depending on h. 189
Proof of Simple Folk Theorem (Cont.)
Case 1: h s · · · s. Then there is a deviation from s in h and thus
according to (τh
1
, τh
2
) both players play s∗
forever :
uδ
1(τh
1, τh
2) = (1 − δ)
∞
k=0
δk
u1(s∗
) = u1(s∗
)(1 − δ)
∞
k=0
δk
= u1(s∗
)
Now (¯τh
1
, τh
2
) gives a sequence w(¯τh
1
,τh
2
) = (s1
, s∗
2
)s∗
s∗
· · · where s1
is
a strategy of player 1 to which he deviates after h.
Here player 2 plays s∗
2
all the time after h because one of the players has
already deviated in h.
We obtain
u1(¯τh
1, τh
2) = (1 − δ)

u1(s1, s∗
2) +
∞
k=1
δk
u1(s∗
)


≤ (1 − δ)

u1(s∗
1, s∗
2) +
∞
k=1
δk
u1(s∗
)


= u1(s∗
)
So this deviation cannot be beneﬁcial no matter what δ is.
190
Proof of Simple Folk Theorem (Cont.)
Case 2: h = s · · · s. Clearly, u1(τh
1
, τh
2
) = u1(s).
Now (¯τh
1
, τh
2
) gives a sequence w(¯τh
1
,τh
2
) = (s1
, s2)s∗
s∗
· · · where s1
is
a strategy of player 1 to which he deviates after h.
As opposed to the previous case, here player 2 ﬁrst plays s2 (since
the deviation of player 1 to s1
is the ﬁrst deviation in the history) and then
both players react by playing s∗
forever.
If u1(s1
, s2) < u1(s), then
uδ
1(¯τh
1, τh
2) = (1 − δ)

u1(s1, s2) +
∞
k=1
δk
u1(s∗
)


< (1 − δ)

u1(s1, s2) +
∞
k=1
δk
u1(s∗
)


< (1 − δ)

u1(s) +
∞
k=1
δk
u1(s)

 = u1(s) = uδ
1(τh
1, τh
2)
and thus this deviation is also not beneﬁcial no matter what δ is.
191
Proof of Simple Folk Theorem (Cont.)
Finally, if u1(s1
, s2) ≥ u1(s), then
uδ
1(¯τh
1, τh
2) = (1 − δ)

u1(s1, s2) +
∞
k=1
δk
u1(s∗
)


= (1 − δ)u1(s1, s2) + (1 − δ)u1(s∗
) · δ
∞
k=0
δk
= u1(s1, s2) − δ · u1(s1, s2) + δ · u1(s∗
)
Thus
uδ
1
(¯τh
1
, τh
2
) ≤ uδ
1
(τh
1
, τh
2
) = u1(s) iff
u1(s1
, s2) − δ · u1(s1
, s2) + δ · u1(s∗
) ≤ u1(s) iff
u1(s1
, s2) − u1(s) ≤ δ · (u1(s1
, s2) − u1(s∗
)) iff
δ ≥
u1(s1
, s2) − u1(s)
u1(s1
, s2) − u1(s∗)
192
Proof of Simple Folk Theorem (Cont.)
Thus (τ1, τ2) satisﬁes the one-shot deviation property in Gδ
irep
w.r.t.
player 1 if
δ ≥
u1(s1
, s2) − u1(s)
u1(s1
, s2) − u1(s∗)
for all s1 ∈ S1 satisfying u1(s1, s2) ≥ u1(s)
Note that the right-hand-side expression is maximized when
u1(s1
, s2) is maximized and thus we get
δ ≥
maxs1
∈S1
u1(s1
, s2) − u1(s)
maxs1
∈S1
u1(s1
, s2) − u1(s∗)
Proving the same for player 2 and putting the results together, we
obtain that (τ1, τ2) satisﬁes the one-shot deviation property in Gδ
irep
if
δ ≥ max
i∈{1,2}
maxsi
∈Si
ui(si
, s−i) − ui(s)
maxsi
∈Si
ui(si
, s−i) − ui(s∗)
(3)
Thus by Theorem 62, (τ1, τ2) is a SPE in Gδ
irep
if δ satisﬁes ineq. (3).
193
Simple Folk Theorem – Example
Consider the inﬁnitely repeated game Girep based on the following
game G:
m f r
M 4, 4 −1, 5 3, 0
F 5, −1 1, 1 0, 0
R 0, 3 0, 0 2, 2
NE in G : (F, f)
Consider the grim trigger for (M, m) using (F, f), i.e., the proﬁle
(τ1, τ2) in Girep where
τ1 : Plays M in a given stage if (M, m) was played in all previous
stages, and plays F otherwise.
τ2 : Plays m in a given stage if (M, m) was played in all previous
stages, and plays f otherwise.
This is a SPE in Gδ
irep
for all δ ≥ 1
4 . Also, ui(τ1, τ2) = 4 for i ∈ {1, 2}.
Are there other SPE? Yes, a grim trigger for (R, r) using (F, f). This is
a SPE in Gδ
irep
for δ ≥ 1
2 .
194
Tacit Collusion
Consider the Cournot duopoly game model G = (N, (Si)i∈N , (ui)i∈N)
N = {1, 2}
Si = [0, κ]
u1(q1, q2) = q1(κ − q1 − q2) − q1c1 = (κ − c1)q1 − q2
1
− q1q2
u2(q1, q2) = q2(κ − q2 − q1) − q2c2 = (κ − c2)q2 − q2
2
− q2q1
Assume for simplicity that c1 = c2 = c and denote θ = κ − c.
If the ﬁrms sign a binding contract to produce only θ/4, their proﬁt
would be θ2
/8 which is higher than the proﬁt θ2
/9 for playing the NE
(θ/3, θ/3).
However, such contracts are forbidden in many countries (including
US).
Is it still possible that the ﬁrms will behave selﬁshly (i.e. only
maximizing their proﬁts) and still obtain such payoffs?
In other words, is there a SPE in the inﬁnitely repeated game based
on G (with a discount factor δ) which gives the payoffs θ2
/8 ?
195
Tacit Collusion
Consider the Cournot duopoly game model G = (N, (Si)i∈N , (ui)i∈N)
N = {1, 2}
Si = [0, ∞)
u1(q1, q2) = q1(κ − q1 − q2) − q1c1 = (κ − c1)q1 − q2
1
− q1q2
u2(q1, q2) = q2(κ − q2 − q1) − q2c2 = (κ − c2)q2 − q2
2
− q2q1
Assume for simplicity that c1 = c2 = c and denote θ = κ − c.
Consider the grim trigger proﬁle for (θ/4, θ/4) using (θ/3, θ/3) :
Player i will
produce qi = θ/4 whenever all proﬁles in the history are
(θ/4, θ/4),
whenever one of the players deviates, produce θ/3 from that
moment on.
Assuming that κ = 100 and c = 10 (which gives θ = 90), this is
a SPE Gδ
irep
for δ ≥ 0.5294 · · · . It results in (θ/4, θ/4)(θ/4, θ/4) · · ·
with the discounted payoffs θ2
/8.
196
Dynamic Games of Complete Information
Repeated Games
Inﬁnitely Repeated Games
Long-Run Average Payoff and Folk Theorems
197
Inﬁnitely Repeated Games & Average Payoff
In what follows we assume that all payoffs in the game G are
positive and that S is ﬁnite!
Let τ = (τ1, τ2) be a strategy proﬁle in the inﬁnitely repeated game
Girep such that wτ = s1
s2
· · · .
Deﬁnition 63
We deﬁne a long-run average payoff for player i by
u
avg
i
(τ) = lim sup
T→∞
1
T
T
t=1
ui(st
)
(Here lim sup is necessary because τi may cause non-existence of the limit.)
The lon-run average payoff u
avg
i
(τ) is well-deﬁned if the limit
u
avg
i
(τ) = limT→∞
1
T
T
t=1 ui(st
) exists.
Given a strategic-form game G, we denote by G
avg
irep
the inﬁnitely
repeated game based on G together with the long-run average
payoff.
198
Inﬁnitely Repeated Games & Average Payoff
Deﬁnition 64
A strategy proﬁle τ is a Nash equilibrium if u
avg
i
(τ) is well-deﬁned for
all i ∈ N, and for every i and every τi
we have that
u
avg
i
(τi, τ−i) ≥ u
avg
i
(τi , τ−i)
(Note that we demand existence of the deﬁning limit of u
avg
i
(τi, τ−i) but
the limit does not have to exist for u
avg
i
(τi
, τ−i).)
Moreover, τ = (τ1, τ2) is a SPE in G
avg
irep
if for every history h we have
that (τh
1
, τh
2
) is a Nash equilibrium.
199
Example
Consider the inﬁnitely repeated game based on Prisoner’s dilemma:
C S
C −5, −5 0, −20
S −20, 0 −1, −1
The grim trigger proﬁle (τ1, τ2) where
τi(s1
· · · sT
) =



S T = 0
S s = (S, S) for all 1 ≤ ≤ T
C otherwise
is a SPE which gives the long-run average payoff −1 to each player.
The intuition behind the grim trigger works as for the discounted payoff:
Whenever a player i deviates, the player −i starts playing C for which the best
response of player i is also C. So we obtain
(S, S) · · · (S, S)(X, Y)(C, C)(C, C) · · · (here (X, Y) is either (C, S) or (S, C)
depending on who deviates). Apparently, the long-run average payoff is −5
for both players, which is worse than −1.
200
Example
Consider the inﬁnitely repeated game based on Prisoner’s dilemma:
C S
C −5, −5 0, −20
S −20, 0 −1, −1
However, other payoffs can be supported by NE. Consider e.g.
a strategy proﬁle (τ1, τ2) such that
Both players cyclically play as follows:
9 times (S, S)
once (S, C)
If one of the players deviates, then, from that moment on, both
play (C, C) forever.
Then (τ1, τ2) is also SPE.
Apparently, u
avg
1
(τ1, τ2) = 9
10 · (−1) + (−20)/10 = −29/10 and
u
avg
1
(τ1, τ2) = 9
10 (−1) = −9/10.
Player 2 gets better payoff than from the Pareto optimal proﬁle (S, S)!
201
Outline of the Folk Theorems
The previous examples suggest that other (possibly all?) convex
combinations of payoffs may be obtained by means of Nash
equilibria.
This observation forms a basis for a bunch of theorems, collectively
called Folk Theorems.
No author is listed since these theorems had been known in games
community long before they were formalized.
In what follows we prove several versions of Folk Theorem
concerning achievable payoffs for repeated games.
Ordered by increasing technical and conceptual difﬁculty, we consider
the following variants:
Long-run average payoffs & SPE
Discounted payoffs & SPE
Long-run average payoffs & Nash equilibria
202
Folk Theorems – Feasible Payoffs
Deﬁnition 65
We say that a vector of payoffs v = (v1, v2) ∈ R2
is feasible if it is
a convex combination of payoffs for pure strategy proﬁles in G with
rational coefﬁcients, i.e., if there are rational numbers βs, here s ∈ S,
satisfying βs ≥ 0 and s∈S βs = 1 such that for both i ∈ {1, 2} holds
vi =
s∈S
βs · ui(s)
We assume that there is m ∈ N such that each βs can be written in
the form βs = γs/m.
The following theorems can be extended to a notion of feasible payoffs using
arbitrary, possibly irrational, coefﬁcients βs in the convex combination.
Roughly speaking, this follows from the fact that each real number can be
approximated with rational numbers up to an arbitrary error. However,
the proofs are technically more involved.
203
Folk Theorems – Long-Run Average & SPE
Theorem 66
Let s∗
be a pure strategy Nash equilibrium in G and let v = (v1, v2) be
a feasible vector of payoffs satisfying vi ≥ ui(s∗
) for both i ∈ {1, 2}.
Then there is a strategy proﬁle τ = (τ1, τ2) in Girep such that
τ is a SPE in G
avg
irep
u
avg
i
(τ) = vi for i ∈ {1, 2}
Proof: Consider a strategy proﬁle τ = (τ1, τ2) in Girep which gives
the following behavior:
1. Unless one of the players deviates, the players play cyclically all
proﬁles s ∈ S so that each s is always played for γs rounds.
2. Whenever one of the players deviates, then, from that moment
on, each player i plays s∗
i
.
It is easy to see that u
avg
i
(τ) = vi.
We verify that τ is SPE.
204
Folk Theorems – Long-Run Average & SPE
Fix a history h, we show that τh
= (τh
1
, τh
2
) is a NE in G
avg
irep
.
If h does not contain any deviation from the cyclic behavior 1.,
then τh
continues according to 1., thus u
avg
i
(τh
) = vi.
If h contains a deviation from 1., then
wτh = s∗
s∗
· · ·
and thus u
avg
i
(τh
) = ui(s∗
).
Now if a player i deviates to ¯τh
i
from τh
i
in G
avg
irep
, then
w(¯τh
i
,τh
−i
) = (s1
i , s−i)(s2
i , s∗
−i)(s3
i , s∗
−i) · · ·
where s1
i
, s2
i
, . . . are strategies of Si and s−i
is a strat. of S−i.
However, then u
avg
i
(¯τh
i
, τh
−i
) ≤ ui(s∗
) ≤ vi since s∗
is a Nash
equilibrium and thus ui(sk
i
, s∗
−i
) ≤ ui(s∗
) for all k ≥ 1.
Intuitively, player −i punishes player i by playing s∗
−i
.
205
Folk Theorems – Discounted Payoffs & SPE
Theorem 67
Let s∗
be a pure strategy Nash equilibrium in G and let v = (v1, v2) be
a feasible payoff satisfying vi > ui(s∗
) for both i ∈ {1, 2}. Then there is
a strategy proﬁle τ = (τ1, τ2) in Girep and δ < 1 such that
τ is a SPE in Gδ
irep
for every δ ∈ [δ, 1) and
limδ →1 uδ
i
(τ) = vi.
Proof: The following claim allows us to reduce the discounted payoff
to the long-run-average.
Claim 5
Let τ be a well-deﬁned strategy proﬁle. Then
lim
δ→1−
uδ
i (τ) = u
avg
i
(τ)
Now to prove Theorem 67, consider the strategy proﬁle τ = (τ1, τ2) in
Girep from the proof of Theorem 66.
We check the one-shot deviation property in Gδ
irep
for δ close to 1. 206
Folk Theorems – Discounted Payoffs & SPE
Fix a history h and consider τh
= (τh
1
, τh
2
).
If h does not contain any deviation from 1., then both players
follow 1., and uδ
i
(τh
) is close to u
avg
i
(τh
) = vi for δ close to 1.
If h contains any deviation from 1., then wτh = s∗
s∗
· · · and
uδ
i
(τh
) = ui(s∗
).
Now assume, w.l.o.g., that player 1 deviates exactly after h,
which gives a strategy ¯τh
1
differing from τh
1
only on h. Thus
w(¯τh
1
,τh
2
) = (s1
, s2
)s∗
s∗
· · · where s1
is a strategy of S1 and s2
is
either the next step in the cyclic behavior described by 1. (if h
follows 1.), or equal to s∗
2
(h does not follow 1.)
Note that for δ close to 1, we have that uδ
i
(¯τh
i
, τh
−i
) is close to
u
avg
i
(¯τh
i
, τh
−i
) = ui(s∗
).
If h follows 1., then uδ
1
(τh
) is close to v1 which is greater
than u1(s∗
) to which uδ
1
(¯τh
1
, τh
2
) is close.
If h does not follow 1., then s2
= s∗
2
(players punish due to
a deviation in h), and thus uδ
1
(¯τh
1
, τh
2
) ≤ u1(s∗
) = uδ
1
(τh
).
207
Folk Theorems – Individually Rational Payoffs
Deﬁnition 68
v = (v1, v2) ∈ R2
is individually rational if for both i ∈ {1, 2} holds
vi ≥ min
s−i ∈S−i
max
si ∈Si
ui(si, s−i)
That is, vi is at least as large as the value that player i may secure by playing
best responses to the most hostile behavior of player −i.
Example:
m f r
M 4, 4 −1, 5 3, 0
F 5, −1 1, 1 0, 0
R 0, 3 0, 0 2, 2
Here any (v1, v2) such that v1 ≥ 2 and v2 ≥ 1 is individually
rational.
208
Folk Theorems – Long-Run Average & NE
Theorem 69
Let v = (v1, v2) be a feasible and individually rational vector of
payoffs. Then there is a strategy proﬁle τ = (τ1, τ2) in Girep such that
τ is a Nash equilibrium in G
avg
irep
u
avg
i
(τ) = vi for i ∈ {1, 2}
Proof: It sufﬁces to use a slightly modiﬁed strategy proﬁle τ = (τ1, τ2)
in Girep from Theorem 66:
Unless one of the players deviates, the players play cyclically all
proﬁles s ∈ S so that each s is always played for γs rounds.
Whenever a player i deviates, the opponent −i plays a strategy
smin
−i
∈ argmins−i ∈S−i
maxsi ∈Si
ui(si, s−i).
It is easy to see that u
avg
i
(τ) = vi.
If a player i deviates, then his long-run average payoff cannot be
higher than mins−i ∈S−i
maxsi ∈Si
ui(si, s−i) ≤ vi, so τ is a NE.
209
Folk Theorems – Long-Run Average & NE
Theorem 70
If a strategy proﬁle τ = (τ1, τ2) is a NE in G
avg
irep
, then u
avg
1
(τ), u
avg
2
(τ)
is individually rational.
Proof: Suppose that u
avg
1
(τ), u
avg
2
(τ) is not individually rational.
W.l.o.g. assume that u
avg
1
(τ) < mins2∈S2
maxs1∈S1
u1(s1, s2).
Now let us consider a new strategy ¯τ1 such that for an arbitrary
history h the pure strategy ¯τ1(h) is a best response to τ2(h).
But then, for every history h, we have
u1(¯τ1(h), τ2(h)) ≥ min
s2∈S2
max
s1∈S1
u1(s1, s2) > u
avg
1
(τ)
So clearly u
avg
1
(¯τ1, τ2) > u
avg
1
(τ) which contradicts the fact that (τ1, τ2)
is a NE.
Note that if irrational convex combinations are allowed in the deﬁnition of
feasibility, then vectors of payoffs for Nash equilibria in G
avg
irep
are exactly
feasible and individually rational vectors of payoffs. Indeed, the coefﬁcients βs
in the deﬁnition of feasibility are exactly frequencies with which the individual
proﬁles of S are played in the NE. 210
Folk Theorems – Summary
We have proved that "any reasonable" (i.e. feasible and
individually rational) vector of payoffs can be justiﬁed as payoffs
for a Nash equilibrium in G
avg
irep
(where the future has "an inﬁnite
weight").
Concerning SPE, we have proved that any feasible vector of
payoffs dominating a Nash equilibrium in G can be justiﬁed as
payoffs for SPE in G
avg
irep
.
This result can be generalized to arbitrary feasible and strictly
individually rational payoffs by means of a more demanding
construction.
For discounted payoffs, we have proved that an arbitrary feasible
vector of payoffs strictly dominating a Nash equilibrium in G can
be approximated using payoffs for SPE in Gδ
irep
as δ goes to 1.
Even this result can be extended to feasible and strictly individually
rational payoffs.
For a very detailed discussion of Folk Theorems see "A Course in
Game Theory" by M. J. Osborne and A. Rubinstein. 211
Summary of Extensive-Form Games
We have considered extensive-form games (i.e., games on trees)
with perfect information
with imperfect information
with chance nodes (and both perfect and imperfect information)
We have considered pure, mixed and behavioral strategies.
We have considered Nash equilibria (NE) and subgame perfect
equilibria (SPE) in pure and behavioral strategies.
212
Summary of Extensive-Form Games (Cont.)
For perfect information we have shown that
mixed and behavioral strategies are equivalent
there is a pure strategy SPE in both pure as well as behavioral
strategies
SPE can be computed using backward induction in polynomial
time
For imperfect information we have shown that
mixed and behavioral strategies are not equivalent in general
(but they are equivalent for games with perfect recall)
backward induction can be used to propagate values through
"perfect information nodes", but "imperfect information parts"
have to be solved by different means
solving imperfect information games is at least as hard as
solving games in strategic-form; however, even in the zero-sum
case, most decision problems are NP-hard (for details see
the lecture).
Chance nodes do not interfere with any of the above results.
213
Summary of Extensive-Form Games (Cont.)
Finally, we discussed repeated games. We considered both, ﬁnitely
as well as inﬁnitely repeated games.
For ﬁnitely repeated games we considered the average payoff and
discussed existence of pure strategy NE and SPE with respect to
existence of NE in the original strategic-form game.
For inﬁnitely repeated games we considered both
discounted payoff: We have proved that
one-shot deviation property is equivalent to SPE
"grim trigger" strategy proﬁles can be used to implement
any vector of payoffs strictly dominating payoffs for a Nash
equilibrium in the original strategic-form game (Simple Folk
Theorem)
long-run average payoff: We have proved that all feasible and
individually rational vectors of payoffs can be achieved by Nash
equilibria (a variant of grim trigger)
214
Games of INcomplete Information
Bayesian Games
Auctions
215
Auctions
The (General) problem: How to allocate (discrete) resources among
selﬁsh agents in a multi-agent system?
Auctions provide a general solution to this problem.
As such, auctions have been heavily used in real life, in consumer,
corporate, as well as government settings:
eBay, art auctions, wine auctions, etc.
advertising (Google adWords)
governments selling public resources: electromagnetic
spectrum, oil leases, etc.
· · ·
Auctions also provide a theoretical framework for understanding
resource allocation problems among self-interested agents: Formally,
an auction is any protocol that allows agents to indicate their interest
in one or more resources and that uses these indications to
determine both the resource allocation and payments of the agents.
216
Auctions: Taxonomy
Auctions may be used in various settings depending on
the complexity of the resource allocation problem:
Single-item auctions: Here n bidders (players) compete for
a single indivisible item that can be allocated to just one of
them. Each bidder has his own private value of the item in
case he wins (gets zero if he loses). Typically (but not
always) the highest bid wins. How much should he pay?
Multiunit auctions: Here a ﬁxed number of identical units of
a homogeneous commodity are sold. Each bidder submits
both a number of units he demands and a unit price he is
willing to pay. Here also the highest bidders typically win,
but it is unclear how much they should pay (pay-as-bid vs
uniform pricing)
Combinatorial auctions: Here bidders compete for a set of
distinct goods. Each player has a valuation function which
assigns values to subsets of the set (some goods are
useful only in groups etc.) Who wins and what he pays?
(We mostly concentrate on the single-item auctions.) 217
Single Unit Auctions
There are many single-item auctions, we consider the following
well-known versions:
open auctions:
The English Auction: Often occurs in movies, bidders are
sitting in a room (by computer or a phone) and the price of
the item goes up as long as someone is willing to bid it
higher. Once the last increase is no longer challenged,
the last bidder to increase the price wins the auction and
pays the price for the item.
The Dutch Auction: Opposite of the English auction, the
price starts at a prohibitively high value and the auctioneer
gradually drops the price. Once a bidder shouts "buy",
the auction ends and the bidder gets the item at the price.
sealed-bid-auction:
k-th price Sealed-Bid Auction: Each bidder writes down his
bid and places it in an envelope; the envelopes are opened
simultaneously. The highest bidder wins and then pays
the k-th maximum bid. (In a reverse auction it is the k-the
minimum.) The most prominent special cases are
The First-Price Auction and The Second-Price Auction. 218
Single Unit Auctions (Cont.)
Observe that
the English auction is essentially equivalent to the second price
auction if the increments in every round are very small.
There exists a "continuous" version, called Japanese auction, where
the price continuously increases. Each bidder may drop out at any time.
The last one who stays gets the item for the current price (which is
the dropping price of the "second highest bid").
similarly, the Dutch auction is equivalent to the ﬁrst price auction.
Note that the bidder with the highest bid stops the decrement of
the price and buys at the current price which corresponds to his
bid.
Now the question is, which type of auction is better?
219
Objectives
The goal of the bidders is clear: To get the item at as low price
as possible (i.e., they maximize the difference between their
private value and the price they pay)
We consider self-interested non-communicating bidders that
are rational and intelligent.
There are at least two goals that may be pursued by
the auctioneer (in various settings):
Revenue maximization
This may lead to auctions that do not always sell the item to the highest
bid
Incentive compatibility: We want the bidders to
spontaneously bid their true value of the item
This means, that such an auction cannot be strategically manipulated
by lying.
220
Auctions vs Games
Consider single-item sealed-bid auctions as strategic form games:
G = (N, (Bi)i∈N , (ui)i∈N) where
The set of players N is the set of bidders
Bi = [0, ∞) where each bi ∈ Bi corresponds to the bid bi
(We follow the standard notation and use bi to denote pure strategies
(bids))
To deﬁne ui, we assume that each bidder has his own private
value vi of the item, then given bids b = (b1, . . . , bn) :
First Price: ui(b) =



vi − bi if bi > maxj i bj
0 otherwise
Second Price: ui(b) =



vi − maxj i bj if bi > maxj i bj
0 otherwise
Is this model realistic? Not really, usually, the bidders are not perfectly
informed about the private values of the other bidders.
Can we use (possibly imperfect information) extensive-form games?
221
Incomplete Information Games
A (strict) incomplete information game is a tuple
G = (N, (Ai)i∈N , (Ti)i∈N , (ui)i∈N) where
N = {1, . . . , n} is a set of players,
Each Ai is a set of actions available to player i,
We denote by A = n
i=1 Ai the set of all action proﬁles
a = (a1, . . . , an).
Each Ti is a set of possible types of player i,
Denote by T = n
i=1 Ti the set of all type proﬁles t = (t1, . . . , tn).
ui is a type-dependent payoff function
ui : A1 × · · · × An × Ti → R
Given a proﬁle of actions (a1, . . . , an) ∈ A and a type ti ∈ Ti, we
write ui(a1, . . . , an; ti) to denote the corresponding payoff.
A pure strategy of player i is a function si : Ti → Ai. As before, we
denote by Si the set of all pure strategies of player i, and by S the set
of all pure strategy proﬁles n
i=1 Si.
222
Dominant Strategies
A pure strategy si very weakly dominates si
if for every ti ∈ Ti
the following holds: For all a−i ∈ A−i we have
ui(si(ti), a−i; ti) ≥ ui(si (ti), a−i; ti)
A pure strategy si weakly dominates si
if for every ti ∈ Ti
the following holds: For all a−i ∈ A−i we have
ui(si(ti), a−i; ti) ≥ ui(si (ti), a−i; ti)
and the inequality is strict for at least one a−i
(Such a−i may be different for different ti.)
A pure strategy si strictly dominates si
if for every ti ∈ Ti
the following holds: For all a−i ∈ A−i we have
ui(si(ti), a−i; ti) > ui(si (ti), a−i; ti)
Deﬁnition 71
si is (very weakly, weakly, strictly) dominant if it (very weakly, weakly,
strictly, resp.) dominates all other pure strategies.
223
Nash Equilibrium
In order to generalize Nash equilibria to incomplete information
games, we use the following notation: Given a pure strategy proﬁle
(s1, . . . , sn) ∈ S and a type proﬁle (t1, . . . , tn) ∈ T, for every player i
write
s−i(t−i) = (s1(t1), . . . , si−1(ti−1), si+1(ti+1), . . . , sn(tn))
Deﬁnition 72
A strategy proﬁle s = (s1, . . . , sn) ∈ S is an ex-post-Nash equilibrium if
for every t1, . . . , tn we have that (s1(t1), . . . , sn(tn)) is a Nash
equilibrium in the strategic-form game deﬁned by the ti’s.
Formally, s = (s1, . . . , sn) ∈ S is an ex-post-Nash equilibrium if for all
i ∈ N and all t1, . . . , tn and all ai ∈ Ai :
ui(s1(t1), . . . , sn(tn); ti) ≥ ui(ai, s−i(t−i); ti)
224
Example: Single-Item Sealed-Bid Auctions
Consider single-item sealed-bid auctions as strict incomplete
information games: G = (N, (Bi)i∈N , (Vi)i∈N , (ui)i∈N) where
The set of players N is the set of bidders
Bi = [0, ∞) where each action bi ∈ Bi corresponds to the bid bi
Vi = [0, ∞) where each type vi ∈ Vi corresponds to the private
value vi
Let vi ∈ Vi be the type of player i (i.e. his private value), then
given an action proﬁle b = (b1, . . . , bn) (i.e. bids) we deﬁne
First Price: ui(b; vi) =



vi − bi if bi > maxj i bj
0 otherwise.
Second Price: ui(b; vi) =



vi − maxj i bj if bi > maxj i bj
0 otherwise.
Note that if there is a tie (i.e., there are k such that bk = b = maxj bj),
then all players get 0.
Are there dominant strategies? Are there ex-post-Nash equilibria?
225
Second-Price Auction
For every i, we denote by vi the pure strategy si for player i deﬁned by
si(vi) = vi.
Intuitively, such a strategy is truth telling, which means that the player bids his
own private value truthfully.
Theorem 73
Assume the Second-Price Auction. Then for every player i we have
that vi is a weakly dominant strategy. Also, v is the unique
ex-post-Nash equilibrium.
Proof. Let us ﬁx a private value vi and a bid bi ∈ Bi such that bi vi.
We show that for all bids of opponents b−i ∈ B−i :
ui(vi, b−i; vi) ≥ ui(bi, b−i; vi)
with the strict inequality for at least one b−i.
Intuitively, assume that player i bids bi against b−i and compare his payoff
with the payoff he obtains by playing vi against b−i.
There are two cases to consider: bi < vi and bi > vi.
226
Second-Price Auction (Cont.)
Case bi < vi : We distinguish three sub-cases depending on b−i.
A. If bi > maxj i bj, then
ui(bi, b−i; vi) = vi − max
j i
bj = ui(vi, b−i; vi)
Intuitively, player i wins and pays the price maxj i bj < bi. However, then
bidding vi, player i wins and pays maxj i bj as well.
B. If there is k i such that bk > maxj k bj, then
ui(bi, b−i; vi) = 0 ≤ ui(vi, b−i; vi)
Moreover, if bi < bk < vi, then we get the strict inequality
ui(bi, b−i; vi) = 0 < vi − bk = ui(vi, b−i; vi)
Intuitively, if another player k wins, then player i gets 0 and increasing bi
to vi does not hurt. Moreover, if bi < bk < vi, then increasing bi to vi
strictly increases the payoff of player i.
C. If there are k such that bk = b = maxj bj, then
ui(bi, b−i; vi) = 0 ≤ ui(vi, b−i; vi)
Intuitively, there is a tie in (bi, b−i) and hence all players get 0.
227
Second-Price Auction (Cont.)
Case bi > vi : We distinguish four sub-cases depending on b−i.
A. If bi > maxj i bj > vi, then
ui(bi, b−i; vi) = vi − max
j i
bj < 0 = ui(vi, b−i; vi)
So in this case the inequality is strict.
B. If bi > vi ≥ maxj i bj, then
ui(bi, b−i; vi) = vi − max
j i
bj = ui(vi, b−i; vi)
Note that this case also covers vi = maxj i bj where decreasing bi to vi
causes a tie with zero payoff for player i.
C. If there is k i such that bk > maxj k bj > vi, then
ui(bi, b−i; vi) = 0 = ui(vi, b−i; vi)
D. If there are k k such that bk = bk = maxj bj > vi, then
ui(bi, b−i; vi) = 0 = ui(vi, b−i; vi)
228
First-Price Auction
Consider the First-Price Auction.
Here the highest bidder wins and pays his bid.
Let us impose a (reasonable) assumption that no player bids more
than his private.
Question: Are there any dominant strategies?
Answer: No, to obtain a contradiction, assume that si is a very
weakly dominant strategy.
Intuitively, if player i wins against some bids of his opponents, then his bid is
strictly higher than bids of all his opponents. Thus he may slightly decrement
his bid and still win with a better payoff.
Formally, assume that all opponents bid 0, i.e., bj = 0 for all j i, and
consider vi > 0.
If si(vi) > 0, then
ui(si(vi), b−i; vi) = vi − si(vi) < vi − si(vi)/2 = ui(si(vi)/2, b−i; vi)
If si(vi) = 0, then
ui(si(vi), b−i; vi) = 0 < vi/2 = ui(vi/2, b−i; vi)
Hence, si cannot be weakly dominant.
229
First-Price Auction (Cont.)
Question: Is there a pure strategy Nash equilibrium?
Answer: No, assume that (s1, . . . , sn) is a Nash equilibrium.
If there are v1, . . . , vn such that some player i wins, i.e., his bid si(vi)
satisﬁes si(vi) > maxj i sj(vj), then
ui(si(vi), s−i(v−i); vi) = vi − si(vi)
< vi − (si(vi) − ε) = ui(si(vi) − ε, s−i(v−i); vi)
for ε > 0 small enough to satisfy si(vi) − ε > maxj i sj(vj)
(i.e., player i may help himself by decreasing the bid a bit)
Assume that for no v1, . . . , vn there is a winner (this itself is a bit
weird). Consider 0 < v1 < · · · < vn. Since there is no winner, there are
two players i, j such that i < j satisfying
sj(vj) = si(vi) ≥ max s (v )
But then, due to our assumption, sj(vj) = si(vi) ≤ vi < vj and thus
uj(sj(vj), s−j(v−j); vj) = 0 < vj − (sj(vj) + ε) = uj(sj(vj) + ε, s−j(v−j); vj)
for ε > 0 small enough to satisfy sj(vj) + ε < vj.
(i.e., player j can help himself by increasing his bid a bit)
230
Summary
Second Price Auction:
There is an ex-post Nash equilibrium in weakly dominant
strategies
It is incentive compatible (players are self-motivated to bid
their private values)
First Price Auction:
There are neither dominant strategies, nor ex-post Nash
equilibria
Question: Can we modify the model in such a way that First
Price Auction has a solution?
Answer: Yes, give the players at least some information about
private values of other players.
231
Bayesian Games
A Bayesian Game G = (N, (Ai)i∈N , (Ti)i∈N , (ui)i∈N , P) where
(N, (Ai)i∈N , (Ti)i∈N , (ui)i∈N) is a strict incomplete information
game and P is a distribution on types, i.e.,
N = {1, . . . , n} is a set of players,
Ai is a set of actions available to player i,
Ti is a set of possible types of player i,
Recall that T = n
i=1 Ti is the set of type proﬁles, and that A = n
i=1 Ai
is the set of action proﬁles.
ui is a type-dependent payoff function
ui : A1 × · · · × An × Ti → R
P is a (joint) probability distribution over T called common
prior.
Formally, P is a probability measure over an appropriate measurable
space on T. However, I will not go into measure theory and consider
only two special cases: ﬁnite T (in which case P : T → [0, 1] so that
t∈T P(t) = 1) and Ti = R for all i (in which case I assume that P is
determined by a (joint) density function p on Rn
). 232
Bayesian Games: Strategies & Payoffs
A play proceeds as follows:
First, a type proﬁle (t1, . . . , tn) ∈ T is randomly chosen
according to P.
Then each player i learns his type ti.
(It is a common knowledge that every player knows his own type but not
the types of other players.)
Each player i chooses his action based on ti.
Each player receives his payoff ui(a1, . . . , an; ti).
A pure strategy for player i is a function si : Ti → Ai.
As before, we use S to denote the set of pure strategy proﬁles.
233
Properties
We assume that ui depends only on ti and not on t−i. This
is called private values model and can be used to model
auctions. This model can be extended to common values
by using ui(a1, . . . , an; t1, . . . , tn).
We assume the common prior P. This means that all
players have the same beliefs about the type proﬁle. This
assumption is rather strong. More general models allow
each player to have
his own individual beliefs about types
... his own beliefs about beliefs about types
.... beliefs about beliefs about beliefs about types
.....
(we get an inﬁnite hierarchy)
There is a generic result of Harsanyi saying that
the hierarchy is not necessary: It is possible to extend
the type space in such a way that each player’s "extended
type" describes his original type as well as all his beliefs.
(This does not mean that common prior sufﬁces.)
234
Example: Battle of Sexes
Assume that player 1 may suspect that player 2 is angry with him/her
(the choice is yours) but cannot be sure.
In other words, there are two types of player 2 giving two different
games.
Formally we have a Bayesian Game
G = (N, (Ai)i∈N , (Ti)i∈N , (ui)i∈N , P) where
N = {1, 2}
A1 = A2 = {F, O}
T1 = {t1} and T2 = {t1
2
, t2
2
}
The payoffs are given by
t1
2
t2
2
t1 :
F O
F 2, 1 0, 0
O 0, 0 1, 2
F O
F 2, 0 0, 2
O 0, 1 1, 0
P(t1
2
) = P(t2
2
) = 1
2
235
Example: Single-Item Sealed-Bid Auctions
Consider single-item sealed-bid auctions as Bayesian games:
G = (N, (Bi)i∈N , (Vi)i∈N , (ui)i∈N , P) where
The set of players N = {1, . . . , n} is the set of bidders
Bi = [0, ∞) where each action bi ∈ Bi corresponds to the bid
Vi = R where each type vi corresponds to the private value
Let vi ∈ Vi be the type of player i (i.e. his private value), then
given an action proﬁle b = (b1, . . . , bn) (i.e. bids) we deﬁne
First Price: ui(b; vi) =



vi − bi if bi > maxj i bj
0 otherwise.
Second Price: ui(b; vi) =



vi − maxj i bj if bi > maxj i bj
0 otherwise.
P is a probability distribution of the private values such that
P(v ∈ [0, ∞)n
) = 1. For example, we may (and will) assume that
each vi is chosen independently and uniformly from [0, vmax]
where vmax is a given number. Then P is uniform on [0, vmax]n
.
236
Finite-Type Bayesian Games: Payoffs
For now, let us assume that each player has only ﬁnitely many
types, i.e., T is ﬁnite.
Given a type proﬁle t = (t1, . . . , tn), we denote by P(t−i | ti)
the conditional probability that the opponents of player i have
the type proﬁle t−i conditioned on player i having ti, i.e.,
P(t−i | ti) :=
P(ti, t−i)
t−i
P(ti, t−i
)
Intuitively, P(t−i | ti) is the maximum information player i may squeeze out of
P about possible types of other players once he learns his own type ti.
Given a pure strategy proﬁle s = (s1, . . . , sn) and a type ti ∈ Ti
of player i the expected payoff for player i is
ui(s; ti) =
t−i∈T−i
P(t−i | ti) · ui(s1(t1), . . . , sn(tn); ti)
(this is the conditional expectation of ui assuming the type ti of player i)
237
Example: Battle of Sexes
t1
2
t2
2
t1 :
F O
F 2, 1 0, 0
O 0, 0 1, 2
F O
F 2, 0 0, 2
O 0, 1 1, 0
P(t1
2
) = P(t2
2
) = 1
2
Consider strategies s1 of player 1 and s2 of player 2 deﬁned by
s1(t1) = F
s2(t1
2
) = F and s2(t2
2
) = O
Then
u1(s1, s2; t1) = 1
2 · 2 + 1
2 · 0 = 1
u2(s1, s2; t1
2
) = 1 and u2(s1, s2; t2
2
) = 2
238
Inﬁnite-Type Bayesian Games: Payoffs
Now assume that for each player i we have Ti = R and thus that
T = Rn
. The concrete type is randomly chosen according to P,
denote by t = (t1, . . . , tn) the corresponding random vector with
distribution P (each ti is a random variable giving a type of player i).
Assume that the type t is absolutely continuous which means that
there is a (joint) density function p such that for all rectangles
R = [a1, b1] × · · · × [an, bn]
P[t ∈ R] =
b1
a1
· · ·
bn
an
p(t1, . . . , tn)dtn · · · dt1
Let pi be the marginal density function of ti, i.e.,
pi(ti) =
T−i
p(ti, t−i)dt−i
The conditional density of t−i = (t1, . . . , ti−1, ti+1, . . . , tn) conditioned
on ti = ti where pi(ti) > 0 is
p(t−i | ti) = p(t)/pi(ti)
(Here t = (t1, . . . , tn) is a type proﬁle.)
239
Inﬁnite-Type Bayesian Games: Payoffs
Given a pure strategy proﬁle s = (s1, . . . , sn) and a type ti ∈ Ti of
player i, the expected payoff for player i is
ui(s; ti) =
T−i
ui(s1(t1), . . . , sn(tn); ti) p(t−i | ti) dt−i
Example: First-Price Auction
Consider the ﬁrst-price auction as a Bayesian game where the types
of players are chosen uniformly and independently from [0, vmax].
Consider a pure strategy proﬁle v = (v1/2, . . . , vn/2) (i.e., each player
i plays vi/2). What is ui(v; vi) ?
ui(v; vi) = P(player i wins) · vi/2 + P(player i loses) · 0
= P(all players except i bid less than vi/2) · vi/2
=
vi
2vmax
n−1
· vi/2
=
vn
i
2nvn−1
max 240
Risk Aversion
We assume that players maximize their expected payoff. Such
players are called risk neutral.
In general, there are three kinds of players that can be described
using the following experiment. A player can choose between two
possibilities: Either get $50 surely, or get $100 with probability 1
2 and
0 with probability 1
2 .
risk neutral person has no preference
risk averse person prefers the ﬁrst alternative
risk seeking person prefers the second one
241
Dominance and Nash Equilibria
A pure strategy si weakly dominates si
if for every ti ∈ Ti the following
holds: For all s−i ∈ S−i we have
ui(si, s−i; ti) ≥ ui(si , s−i; ti)
and the inequality is strict for at least one s−i.
The other modes of dominance are deﬁned analogously. Dominant strategies
are deﬁned as usual.
Deﬁnition 74
A pure strategy proﬁle s = (s1, . . . , sn) ∈ S in the Bayesian game is
a pure strategy Bayesian Nash equilibrium if for each player i and
each type ti ∈ Ti of player i and every strategy si
∈ Si we have that
ui(si, s−i; ti) ≥ ui(si , s−i; ti)
242
Example: Battle of Sexes
t1
2
t2
2
t1 :
F O
F 2, 1 0, 0
O 0, 0 1, 2
F O
F 2, 0 0, 2
O 0, 1 1, 0
P(t1
2
) = P(t2
2
) = 1
2
Use the following notation: (X, (Y, Z)) means that player 1 plays X ∈ {F, O},
and player 2 plays Y ∈ {F, O} if his/her type is t1
2
and Z ∈ {F, O} otherwise.
Are there pure strategy Bayesian Nash equilibria?
(F, (F, O)) is a Bayesian NE.
Even though O is preferred by player 2, the outcome (O, O) cannot
occur with a positive probability in any BNE.
To ever meet at the opera, player 1 needs to play O.
The unique best response of player 2 to O is (O, F)
But (O, (O, F)) is not a BNE:
The expected payoff of player 1 at (O, (O, F)) is 1
2
The expected payoff of player 1 at (F, (O, F)) is 1 243
Second Price Auction
Consider the second-price sealed-bid auction as a Bayesian
game where the types of players are chosen according to
an arbitrary distribution.
Proposition 4
In a second-price sealed-bid auction, with any probability
distribution P, the truth revealing proﬁle of bids, i.e.,
v = (v1, . . . , vn), is a weakly dominant strategy proﬁle.
Proof.
The exact same proof as for the strict incomplete information
games. Indeed, we do not need to assume that the players
have a common prior for this!
244
First Price Auction
Consider the ﬁrst-price sealed-bid auction as a Bayesian game with
some prior distribution P.
Note that bidding truthfully does not have to be a dominant strategy.
For example, if player i knows that (with high probability) his value vi
is much larger than maxj i vj, he will not waste money and bid less
than vi.
So is there a pure strategy Bayesian Nash equilibrium?
Proposition 5
Assume that for all players i the type of player i is chosen
independently and uniformly from [0, vmax]. Consider a pure strategy
proﬁle s = (s1, . . . , sn) where si(vi) = n−1
n vi for every player i and
every value vi. Then s is a Bayesian Nash equilibrium.
Proof. We show that si(vi) = n−1
n vi is the best response to s−i for all i.
Let us ﬁx i and consider a pure strategy si
of player i.
Fix vi and deﬁne bi = si
(vi). We show (see the greenboard) that
bi = n−1
n vi maximizes ui(bi, s−i; vi). This holds for all vi, and thus
si
= si is the best response to s−i.
245
First Price Auction (Cont.)
More generally, assume only that the private values vi are identically
and independently distributed on [vmin, vmax] (this is called
independent private values model). Let F(x) be the cumulative
distribution function of the private value (for each player).
Let us restrict to strictly increasing strategies.
Note that this restriction is quite reasonable, intuitively it means, that
the higher the private value, the higher is the bid.
Then one may show that there is a symmetric Bayesian Nash
equilibrium (s1, . . . , sn) where each si is deﬁned by
si(vi) = vi −
vmax
vmin
[F(vi)]n−1
dx
[F(vi)]n−1
That is, in particular, the bid is always smaller than the private value.
246
Expected Revenue
Consider the ﬁrst and second price sealed-bid auctions. For
simplicity, assume that the type of each player is chosen
independently and uniformly from [0, 1].
What is the expected revenue of the auctioneer from these two
auctions when the players play the corresponding Bayesian NE?
In the ﬁrst-price auction, players bid n−1
n vi. Thus the probability
distribution of the revenue is
F(x) = P(max
j
n − 1
n
vj ≤ x) = P(max
j
vj ≤
nx
n − 1
) =
nx
n − 1
n
It is straightforward to show that then the expected maximum bid
in the ﬁrst-price auction (i.e., the revenue) is n−1
n+1 .
In the second-price auction, players bid vi. However, the revenue
is the expected second largest value. Thus the distribution of the
revenue is
F(x) = P(max
j
vj ≤ x) +
n
i=1
P(vi > x and for all j i, vj ≤ x)
Amazingly, this also gives the expectation n−1
n+1 . 247
Revenue Equivalence (Cont.)
The result from the previous slide is a special case of a rather
general revenue equivalence theorem, ﬁrst proved by Vickrey
(1961) and then generalized by Myerson (1981).
Both Vickrey and Myerson were awarded Nobel Prize in economics for their
contribution to the auction theory.
Theorem 75 (Revenue Equivalence)
Assume that each of n risk-neutral players has independent
private values drawn from a common cumulative distribution
function F(x) which is continuous and strictly increasing on
an interval [vmin, vmax] (the probability of vi [vmin, vmax] is zero).
Then any efﬁcient auction mechanism in which any player with
value vmin has an expected payoff zero yields the same
expected revenue.
Here efﬁcient means that the auction has a symmetric and
increasing Bayesian Nash equilibrium and always allocates
the item to the player with the highest bid.
248