Repeated Games • This week we examine the effect of repetition on strategic behavior in games with perfect information. • If a game is played repeatedly, with the same players, the players may behave very differently than if the game is played just once (a one-shot game), e.g., repeatedly borrow a friend’s car versus renting a car. • Two types of repeated games: – Finitely repeated: the game is played for a finite and known number of rounds, for example, 2 rounds/repetitions. – Infinitely or Indefinitely repeated: the game has no predetermined length; players act as though it will be played indefinitely, or it ends only with some probability. Finitely Repeated Games • Writing down the strategy space for repeated games is difficult, even if the game is repeated just 2 rounds. For example, consider the finitely repeated game strategies for L R the following 2x2 game played just twice. U • For a row player: D – U1 or D1 Two possible moves in round 1 (subscript 1). – For each first round history pick whether to go U2 or D2 The histories are: (U1,L1) (U1,R1) (D1,L1) (D1,R1) 2 + 2 + 2 + 2 – 8 possible strategies, 16 strategy profiles! Strategic Form of a 2-Round Finitely Repeated Game • This quickly gets messy! L2 L2 R2 U2 U2 L1 D2 L2 U2 D2 R2 R2 R1 D2 U1 D1 L2 U2 D2 R2 Finite Repetition of a Game with a Unique Equilibrium • Fortunately, we may be able to determine how to play a finitely repeated game by looking at the equilibrium or equilibria in the one-shot or “stage game” version of the game. • For example, consider a 2x2 game with a unique equilibrium, e.g. the Prisoner’s Dilemma: higher numbers=years in prison, are worse. • Does the equilibrium change if this game is played just 2 rounds? A Game with a Unique Equilibrium Played Finitely Many Times Always Has the Same Subgame Perfect Equilibrium Outcome • To see this, apply backward induction to the finitely repeated game to obtain the subgame perfect Nash equilibrium (spne). • In the last round, round 2, both players know that the game will not continue further. They will therefore both play their dominant strategy of Confess. • Knowing the results of round 2 are Confess, Confess, there is no benefit to playing Don’t Confess in round 1. Hence, both players play Confess in round 1 as well. • As long as there is a known, finite end, there will be no change in the equilibrium outcome of a game with a unique equilibrium. This is also true for zero or constant sum games. Finite Repetition of a Sequential Move Game • Recall the incumbent-rival game: • In the one-shot, sequential move game there is a unique subgame perfect equilibrium where the rival enters and the incumbent accommodates. • Does finite repetition of this game change the equilibrium? • Should it? The Chain Store Paradox • Selten (1978) proposed a finitely repeated version of the incumbent-rival (entry) game in which the incumbent firm is a monopolist with a chain of stores in 20 different locations. • He imagined that in each location the chain store monopolist was challenged by a local rival firm, indexed by f=1,2,…20. • The game is played sequentially: firm 1 decides whether to enter or not at location 1, chain store decides to fight, accommodate, then firm 2, etc. • Consider the last rival, firm 20. Since the incumbent gains nothing by fighting this last firm and does better by accommodating, he will accommodate, and firm 20 will therefore choose to enter. But if the incumbent will accommodate firm 20, there is nothing he gains from fighting firm 19, etc. • By backward induction, each firm f=1,2…20 chooses Enter and the Incumbent always chooses Accommodate. This game theoretic solution is what Selten calls the “induction hypothesis”. What about Deterrence? • • • • • • Selten noted that while the induction argument is the logically correct, gametheoretic solution assuming rationality and common knowledge of the structure of the game, it does not seem empirically plausible – why? Under the enter/accommodate equilibrium, the incumbent earns a payoff of 2x20 =40. But perhaps he can do better, for instance, suppose the incumbent chooses to fight the first 15 rivals and accommodate the last 5. If this strategy this is common knowledge then the first 15 stay out and earn a payoff of 1 each, while the incumbent earns 5x15+2x5=85>40. Even if some of the first 15 rivals choose to enter anyway, say (2/5ths=6), the incumbent can still be better off; in that case he gets 5x(15-6)+2x5=55>40! This contradiction between the game theoretic solution and an empirically plausible “deterrence hypothesis” is what Selten labeled the chain-store paradox. The paradox results from the game theoretic assumption that all players presume one another to be perfectly rational and know (via common knowledge) the structure of the game. They are thus led to conclude that the incumbent will never, ever fight. By this standard, fighting would be an irrational move and would never be observed. Finite Repetition of a Simultaneous Move Game with Multiple Equilibria: The Game of Chicken • Consider 2 firms playing the following one-stage Chicken game. • The two firms play the game N>1 times, where N is known. What are the possible subgame perfect equilibria? • In the one-shot “stage game” there are 3 equilibria, Ab, Ba and a mixed strategy where row plays A and column plays a with probability ½, and the expected payoff to each firm is 2. Games with Multiple Equilibria Played Finitely Many Times Have Many Subgame Perfect Equilibria Some subgame perfect equilibrium of the finitely repeated version of the stage game are: 1. Ba, Ba, .... N times 2. Ab, Ab, ... N times 3. Ab, Ba, Ab, Ba,... N times 4. Aa, Ab, Ba N=3 rounds. Strategies Supporting these Subgame Perfect Equilibria 1. Ba, Ba,... Row Firm first move: Play B Avg. Payoffs: (4, 1) Second move: After every possible history play B. Column Firm first move: Play a Second move: After every possible history play a. 2. Ab, Ab,... Row Firm first move: Play A Avg. Payoffs: (1, 4) Second move: After every possible history play A. Column Firm first move: Play b Second move: After every possible history play b. 3. Ab, Ba, Ab, Ba,.. Row Firm first round move: Play A Avg. Payoffs: (5/2, 5/2) Even rounds: After every possible history play B. Odd rounds: After every possible history play A. Column Firm first round move: Play b Even rounds: After every possible history play a Odd rounds: After every possible history play b. What About that 3-Round S.P. Equilibrium? 4. Aa, Ab, Ba (3 Rounds only) can be supported by the strategies: Row Firm first move: Play A Second move: – If history is (A,a) or (B,b) play A, and play B in round 3 unconditionally. – If history is (A,b) play B, and play B in round 3 unconditionally. – If history is (B,a) play A, and play A in round 3 unconditionally. Column Firm first move: Play a Second move: – If history is (A,a) or (B,b) play b, and play a in round 3 unconditionally. – If history is (A,b) play a, and play a in round 3 unconditionally. – If history is (B,a) play b, and play b in round 3 unconditionally. Avg. Payoff to Row = (3+1+4)/3 = Avg. Payoff to Column: (3+4+1)/3 = 2.67. More generally if N=101 then, Aa, Aa, Aa,...99 followed by Ab, Ba is also a s.p. eq. Why is this a Subgame Perfect Equilibrium? • Because Aa, Ab, Ba is each player’s best response to the other player’s strategy at each subgame. • Consider the column player. Suppose he plays b in round 1, and row sticks to the plan of A. The round 1 history is (A,b). – According to Row’s strategy given a history of (A,b), Row will play B in round 2 and B in round 3. – According to Column’s strategy given a history of (A,b), Column will play a in round 2 and a in round 3. • Column player’s average payoff is (4+1+1)/3 = 2. This is less than the payoff it earns in the subgame perfect equilibrium which was found to be 2.67. Hence, column player will not play b in the first round given his strategy and the Row player’s strategies. • Similar argument for the row firm. Summary • A repeated game is a special kind of game (in extensive or strategic form) where the same one-shot “stage” game is played over and over again. • A finitely repeated game is one in which the game is played a fixed and known number of times. • If a simultaneous move game has a unique Nash equilibrium, or a sequential move game has a unique subame perfect Nash equilibrium this equilibrium is also the unique subgame perfect equilibrium of the finitely repeated game. • If a simultaneous move game has multiple Nash equilibria, then there are many subgame perfect equilibria of the finitely repeated game. Some of these involve the play of strategies that are collectively more profitable for players than the oneshot stage game Nash equilibria, (e.g. Aa, Ba, Ab in the last game studied). Infinitely Repeated Games • Finitely repeated games are interesting, but relatively rare; how often do we really know for certain when a game we are playing will end? (Sometimes, but not often). • Some of the predictions for finitely repeated games do not hold up well in experimental tests: – The unique subgame perfect equilibrium in the finitely repeated ultimatum game or prisoner’s dilemma game (always confess) are not usually observed in all rounds of finitely repeated games. • On the other hand, we routinely play many games that are indefinitely repeated (no known end). We call such games infinitely repeated games, and we now consider how to find subgame perfect equilibria in these games. Discounting in Infinitely Repeated Games • Recall from our earlier analysis of bargaining, that players may discount payoffs received in the future using a constant discount factor, = 1/(1+r), where 0 < < 1. – For example, if =.80, then a player values $1 received one period in the future as being equivalent to $0.80 right now (x$1). Why? Because the implicit one period interest rate r=.25, so $0.80 received right now and invested at the one-period rate r=.25 gives (1+.25) x$0.80 = $1 in the next period. • Now consider an infinitely repeated game. Suppose that an outcome of this game is that a player receives $p in every future play (round) of the game. • The value of this stream of payoffs right now is : $p ( + 2 + 3 + ..... ) • The exponential terms are due to compounding of interest. Discounting in Infinitely Repeated Games, Cont. • The infinite sum, 2 3 ... converges to 1 • Simple proof: Let x= 2 3 ... Notice that x = ( 2 3 ...) x solve x x for x : (1 ) x ; x 1 • Hence, the present discounted value of receiving $p in every future round is $p[/(1-)] or $p/(1-) • Note further that using the definition, =1/(1+r), /(1-) = [1/(1+r)]/[1-1/(1+r)]=1/r, so the present value of the infinite sum can also be written as $p/r. • That is, $p/(1-) = $p/r, since by definition, =1/(1+r). The Prisoner’s Dilemma Game (Again!) • Consider a new version of the prisoner’s dilemma game, where higher payoffs are now preferred to lower payoffs. C C D D c,c a,b b,a d,d C=cooperate, (don’t confess) D=defect (confess) • To make this a prisoner’s dilemma, we must have: b>c >d>a. We will use this example in what follows. C C D D 4, 4 0, 6 6, 0 2, 2 Suppose the payoffs numbers are in dollars Sustaining Cooperation in the Infinitely Repeated Prisoner’s Dilemma Game • The outcome C,C forever, yielding payoffs (4,4) can be a subgame perfect equilibrium of the infinitely repeated prisoner’s dilemma game, provided that 1) the discount factor that both players use is sufficiently large and 2) each player uses some kind of contingent or trigger strategy. For example, the grim trigger strategy: – First round: Play C. – Second and later rounds: so long as the history of play has been (C,C) in every round, play C. Otherwise play D unconditionally and forever. • Proof: Consider a player who follows a different strategy, playing C for awhile and then playing D against a player who adheres to the grim trigger strategy. Cooperation in the Infinitely Repeated Prisoner’s Dilemma Game, Continued • Consider the infinitely repeated game starting from the round in which the “deviant” player first decides to defect. In this round the deviant earns $6, or $2 more than from C, $6-$4=$2. • Since the deviant player chose D, the other player’s grim trigger strategy requires the other player to play D forever after, and so both will play D forever, a loss of $4-$2=$2 in all future rounds. • The present discounted value of a loss of $2 in all future rounds is $2/(1-) • So the player thinking about deviating must consider whether the immediate gain of 2 > 2/(1-), the present value of all future lost payoffs, or if 2(1-) > 2, or 2 >4, or 1/2 > . • If ½ < < 1, the inequality does not hold, and so the player thinking about deviating is better off playing C forever. Other Subgame Perfect Equilibria are Possible in the Repeated Prisoner’s Dilemma Game • The “Folk theorem” of repeated games says that almost any outcome that on average yields the mutual defection payoff or better to both players can be sustained as a subgame perfect Nash equilibrium of the indefinitely repeated Prisoner’s Dilemma game. Row Player Avg. Payoff The set of subgame perfect Nash Equilibria, is the green area, as determined by average payoffs from all rounds played (for large enough discount factor, ). Mutual defection-in-all rounds equilibrium The efficient, mutual cooperation-in all-rounds equilibrium outcome is here, at 4,4. The set of feasible payoffs is the union of the green and yellow regions Column Player Avg. Payoff Must We Use a Grim Trigger Strategy to Support Cooperation as a Subgame Perfect Equilibrium in the Infinitely Repeated PD? • There are “nicer” strategies that will also support (C,C) as an equilibrium. • Consider the “tit-for-tat” (TFT) strategy (row player version) – First round: Play C. – Second and later rounds: If the history from the last round is (C,C) or (D,C) play C. If the history from the last round is (C,D) or (D,D) play D. • This strategy “says” play C initially and as long as the other player played C last round. If the other player played D last round, then play D this round. If the other player returns to playing C, play C at the next opportunity, else play D. • TFT is forgiving, while grim trigger (GT) is not. Hence TFT is regarded as being “nicer.” TFT Supports (C,C) forever as a NE in the Infinitely Repeated PD • Proof. Suppose both players play TFT. Since the strategy specifies that both players start off playing C, and continue to play C so long as the history includes no defections, the history of play will be (C,C), (C,C), (C,C), ...... • Now suppose the Row player considers deviating in one round only and then reverting to playing C in all further rounds, while Player 2 is assumed to play TFT. • Player 1’s payoffs starting from the round in which he deviates are: 6, 0, 4, 4, 4,..... If he never deviated, he would have gotten the sequence of payoffs 4, 4, 4, 4, 4,... So the relevant comparison is whether 6+0 > 4+4. The inequality holds if 2>4 or ½ > . So if ½ < < 1, the deviation is not profitable. • No other profitable deviations if ½ < < 1. TFT as an Equilibrium Strategy is not Subgame Perfect • To be subgame perfect, an equilibrium strategy must prescribe best responses after every possible history, even those with zero probability under the given strategy. • Consider two TFT players, and suppose that the row player “accidentally” deviates to playing D for one round – “a zero probability event” - but then continues playing TFT as before. • Starting with the round of the deviation, the history of play will look like this: (D,C), (C,D), (D,C), (C,D),..... Why? Just apply the TFT strategy. • Consider the payoffs to the column player 2 starting from round 2 6 0 6 2 0 3 6 4 0 5 ... 6 6( 2 4 ...) 6 6 2 /(1 2 ) 6 /(1 2 ). TFT is not Subgame Perfect, cont’d. • If the column player 2 instead played C in round 2 and then continued with TFT (a one-shot deviation), the history would become: (D,C), (C,C), (C,C), (C,C)..... • In this case, the payoffs to the column player 2 starting from round 2 would be: 2 3 4 4 4 4 ... 4 4( 2 3 ...), 4 4 /(1 ) 4 /(1 ) • Column player 2 asks whether 6 /(1 2 ) 4(1 ) 4 2 6 2 0, which is false for any 1 / 2 1. • Column player 2 reasons that it is better to deviate from TFT! One-Shot Deviation Principle • Above example is illustration of the one-shot deviation principle • The strategy profile of an infinitely repeated game is SPE if and only if there is no profitable one-shot deviation starting from any history. Must We Discount Payoffs? • Answer 1: How else can we distinguish between infinite sums of different constant payoff amounts? • Answer 2: We don’t have to assume that players discount future payoffs. Instead, we can assume that there is some constant, known probability q, 0 < q < 1, that the game will continue from one round to the next. Assuming this probability is independent from one round to the next, the probability the game is still being played T rounds from right now is qT. – Hence, a payoff of $p in every future round of an infinitely repeated game with a constant probability q of continuing from one round to the next has a value right now that is equal to: $p(q+q2+q3+....) = $p[q/(1-q)]. – Similar to discounting of future payoffs; equivalent if q=. Play of a Prisoner’s Dilemma with an Indefinite End • Let’s play the Prisoner’s Dilemma game studied today but with a probability q=.8 that the game continues from one round to the next. • What this means is that at the end of each round the computer program draws a random number between 0 and 1. If this number is less than or equal to .80, the game continues with another round. Otherwise the game ends. • We refer to the game with an indefinite number of repetitions of the stage game as a supergame. • The expected number of rounds in the supergame is: 1+q+q2+q3+ …..= 1/(1-q)=1/.2 = 5; In practice, you may play more than 5 rounds or less than 5 rounds in the supergame: it just depends on the sequence of random draws. Data from an Indefinitely Repeated Prisoner’s Dilemma Game with Fixed Pairings • From Duffy and Ochs, Games and Economic Behavior, 2009 Fixed Pairings, 14 Subjects Average Cooperation Frequency of 7 pairs % X (Cooperate) 1 0.8 0.6 0.4 0.2 0 1 4 7 10 13 1 4 7 2 2 2 5 8 3 6 9 12 15 18 21 1 4 3 3 6 9 12 15 18 21 24 27 30 3 6 9 12 15 18 Round Number (1 Corresponds to the Start of a New Game) • Discount factor =.90 =probability of continuation • The start of each new supergame is indicated by a vertical line at round 1. • Cooperation rates start at 30% and increase to 80% over 10 supergames.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project