Prisoner's Dilemma

Game models like Prisoner's Dilemma are instructive because they point up precisely why lack of trust leads to outcomes that are bad for both sides. Lack of trust is not the whole story. The main trouble lies in the mistaken belief that "rational" choices are those that seem to be in one's own interest.

Acting on a tip, the police searched the apartments of two men and found several of the articles reported stolen. This was enough evidence to convict the men of possession of stolen goods but not enough to convict them of burglary unless one or both confessed. The men were arrested and put in separate cells. They could not communicate with each other.

Wanting to get a confession, the prosecutor resorted to a stratagem. He said to each of the prisoners separately:

"The penalty for possession of stolen goods is one year in prison, for burglary five years. I don't have enough evidence to convict you of burglary unless I get a confession. If you both confess, your sentence will be reduced from five to three years. If you don't confess but your partner does, your partner will not be prosecuted at all. You, however, will be given the full five years on the strength of his confession. If neither of you confesses, then you will go to prison anyway for a year, the penalty for possessing stolen goods. Think it over.

Each prisoner thinks of the situation and reasons as follows: "I don't know whether my partner will confess or not, but I can anticipate what will happen in case he does or in case he does not. If he confesses and I don't, he goes free while I go to prison for five years. So clearly, if he confesses, I should also confess, since in that case, I get only three years. Suppose now he doesn't confess. Then if I don't confess, I go to prison for one year. But if I confess, I go free. So if he doesn't confess, I am better off confessing. So it is to my advantage to confess whether he confesses or not. Therefore I shall confess."

The other prisoner, being in the same situation, reasons the same way and comes to the same conclusion. So they both confess and both get convicted of burglary and go to prison for three years. Had they not confessed, they would have been convicted only for possessing stolen goods and would have gone to prison for only one year. We see that although it was clearly in the interest of each of them to confess, it was nevertheless in the interest of both not to confess.

An individual is said to be rational if he takes into account the consequences of his actions and acts so as to assure consequences that are best for himself under the circumstances. In confessing, each of the prisoners was apparently rational, because the anticipated consequences of confessing seemed to be better than those of not confessing. They were individually rational. Nevertheless they were collectively not rational, because for both of them the consequences of confessing were worse than the consequences of not confessing would have been.

What does it take to act in a collectively rational manner? Let us see what would happen if the prisoners could have a talk. Then they could make an agreement - not to confess. This agreement would be to the advantage of both of them, because refusing to confess would result in a one year sentence, while if both confessed, they would both get three years. So let us suppose they made the agreement which would benefit both of them. However, the agreement does not necessarily resolve the dilemma. It only shifts it to another level: "Shall I keep the agreement or break it?" Each thinks:

"Will he or won't he keep the agreement? If he keeps it and refuses to confess, I would go scot free if I broke the agreement and confessed. If he breaks the agreement, I would be a fool to keep it, for if I kept it while he broke it, I would get the most severe sentence. So it is to my advantage to break the agreement."

The other's situation is the same, and he comes to the same conclusion. So what was the point of concluding the agreement in the first place?

There is one way the prisoners can escape from the dilemma, namely, by being trustworthy and trusting. Being trustworthy means keeping the agreement once it has been made. If both are trustworthy, they keep the agreement, which benefits both of them. Being trusting means assuming that the other will keep the agreement. We have seen that trust is as necessary as trustworthiness. For even if each is trustworthy (will not break the agreement to derive benefit from breaking it) but does not trust the other, i.e., does not believe that the other is trustworthy, then each will feel that he should break the agreement "in self defence," as it were, since to keep it while the other breaks it is to be stuck with the most severe sentence.

If both men are both trustworthy and trusting, there is no need for an explicit agreement in the first place. For if each trusts the other, each can be convinced that they will both do what is best for both of them. And if each is trustworthy, then neither will take advantage of the other's trust to gain at the expense of the other.

Situations in which two or more persons must make choices among alternatives and then face the consequences of these choices are called games. So-called games of strategy are clearly games of this sort. In chess, for example, players alternately choose among a number of possible moves. As a result of all these choices, one or the other player wins or else the game is a draw. In card games, players choose the card they are going to play next or the bid they are going to make in the light of bids already made or whatever. The outcome again depends on the choices made by the players. Associated with each outcome are payoffs, represented as numbers - positive (winnings) or negative (losses). In games of strategy played by two persons, the winnings of one are usually exactly balanced by the losses of the other. Thus the sum of the payoffs in every outcome is zero. For this reason, such games are called zerosum games. Some situations, of which we will speak below, are best represented by non-zerosum games, i.e., games in which the sum of the payoffs in the various outcomes are not necessarily equal. An example of such a game is shown in Figure 1.

Prisoner's Dilemma Figure 1

In this game, each of the players makes only one choice. One of them chooses between the upper and lower horizontal rows, labeled C1 and D1 respectively. This player will be called Row. The other player, called Column, without knowing how Row has chosen, chooses either of the vertical columns, C2 or D2. Each pair of choices determines one of the four boxes. The two numbers in each box are the payoffs, the first one to Row, the second to Column. For instance, if the players choose C1 and C2 respectively, each wins 1 unit. If Row chooses C1 while Column chooses D2, Row loses 10 units (gets —10), while Column wins 10 units. If both choose D, both lose one unit.

In zerosum games, where whatever one player wins, the other must lose, the interests of the players are diametrically opposed. In non-zerosum games like the one represented in Figure 1, this is not necessarily the case. For instance, both players prefer C1C2 to D1D2.

Observe, however, that Row like D1 better than C1 regardless of how Column chooses, since if Column chooses C1, Row gets 10 by choosing D1 but only 1 by choosing C1, whereas if Column chooses D2, Row gets -1 by choosing D1 but -10 by choosing C1.

We see that this game has exactly the same structure, i.e., leads to the same sort of reasoning as Prisoner's Dilemma. For this reason, all games with this structure are called Prisoner's Dilemma in honour of the original anecdote, suggested by A.W. Tucker, to illustrate this structure. The choice C in Prisoner's Dilemma stands for "cooperation" (choosing to promote a common interest). The choice D stands for "defection" (withdrawal of cooperation).

Many real life situations in interpersonal relations, in business, and in international relations resemble Prisoner's Dilemma. Of course, Prisoner's Dilemma, which requires each player to make only one choice, is a drastic simplification of even the simplest real life situation. But it contains the main ingredient of this type of situation and is for that reason instructive.

Competition for Markets

Castor and Pollux are two firms selling the same product. Each wants as big a share of the market as it can get. It can gain a competitive advantage by underselling the other firm. Reducing the situation to bare essentials, we suppose that each firm chooses between just two alternatives: selling the product at a low price or at a high price. If both sell at a high price, both make a profit of $1,000,000. If both sell at a low price, both suffer a loss of $1,000,000. If Castor sells at a low price, while Pollux sells at a high price, Castor captures the market and eventually makes $10,000, while Pollux, driven out of the market, goes bankrupt and loses $10,000,000. The result is reversed if Pollux undersells Castor. Taking $1,000,000 as our unit of payoff, we see that Figure 1 reflects these payoffs. Thus, the situation is an instance of Prisoner's Dilemma. If both pursue their individual interests, both lose. If they pursue their common interest, both win.

The Arms Race

The superpowers have a choice of stopping their arms race or of continuing it. If they come to an agreement to stop it and keep the agreement, both are better off. If they continue the arms race, both are worse off. If one continues to arm, while the other does not (or disarms), the first superpower gains military superiority, which supposedly enables it to intimidate the weaker adversary. Again, as in the other versions of Prisoner's Dilemma, it seems to be in the individual interest of each "player" to do one thing (here to continue arming) although it is in the interest of both to stop the arms race.

In all versions of Prisoner's Dilemma, the resolution of the dilemma can be achieved if the two make an agreement to cooperate (sell at a high price, stop the arms race), resist the temptation to violate the agreement, and, above all, trust the other not to violate it. Unfortunately, in situations of keen competition and especially in international relations dominated by a struggle for power, trusting a competitor, a rival, or an adversary is regarded as a mark of "idealism," which supposedly can be indulged in in a "perfect" world but not in a world of hard "realities." We have seen, however, what happens in situations in which each participant is a "hard-headed realist," acting so as to safeguard his individual interest. Both lose.

Iterated Games

So far we have considered only games in which each player makes just one choice. Real life situations, like real games, usually involve sequences of choices. We will introduce sequential choices by having Prisoner's Dilemma played many times in succession. Each time, each player will choose between two alternatives, C or D. The outcome of these choices will be announced. The players will then go on to make the next choice, and so on.

This iterated game can still be represented by a game in which each player makes just one choice, but now the number of alternatives is much larger. It is, in fact, enormous, but we will consider only a few of these alternatives.

The alternatives in an iterated game are called strategies. A strategy is essentially a plan of action which specifies what a player will do on each of the successive plays. His choices on successive plays will in general depend on what has happened up to the play in question. The following statement represents a choice of strategy:

"I will begin with C. On any particular play I will look back on the choices of my co-player up to that play. If he has chosen C at least one third of the time up to that play, then I will choose C; otherwise D."

Another strategy might be:

"I will start by choosing D and will continue to play D as long as the co-player plays D. As soon as he plays C, however, I will play C on the next five plays (unless the game ends) and thereafter revert to D, repeating the pattern as long as the game lasts."

Some strategies may involve the use of a random device such as a tossed coin or a rolled die. For instance:

"After every play, I will roll a die. If the co-player has played C on the preceding play, then I will play C if the die shows '1' or "8"; otherwise D. If the co-player has played D, I will play C only if the die shows '6'. However, whatever happens on the first 99 plays I will play D on the 100th, thereafter repeating the pattern."

As one can see, strategies can be quite complex. They can also be quite simple, for instance: "I will always play C," or "I will always play D," or "I will alternate between C and D." In these strategies, the player's choices are independent of the co-player's.

If each player chooses a strategy, the course of the iterated game is completely determined. In fact, these strategies can be written as computer programmes, and two computers so programmed can play the iterated game. The result of such an iteration will be a cumulated payoff to each player (or programme). An interesting question now arises: what is a good programme for playing an iterated Prisoner's Dielmma? The quality of the programme can be naturally evaluated by the scores it achieves when paired with other programmes.

In 1979, Robert Axelrod, a professor of political science at the University of Michigan arranged a contest. Interested persons were invited to submit programmes for playing iterated Prisoner's Dilemma 200 times. Each programme submitted would be paired with every other, including itself. The cumulated payoffs of each such encounter would be added, and the programme achieving the largest total payoff would be declared the winner of the contest.

Fourteen programmes were submitted in that contest. Some were quite complicated.

It is clear that the way to get a higher score than the co-player in iterated Prisoner's Dilemma is to play more D's than he. This is because only when one plays D, while the co-player plays C, does one get a bigger payoff on that play, Apparently, several of the contestants tried to design strategies which would be likely to produce more D's when coupled with other strategies. In fact, a strategy consisting only of D's would be sure to play at least as many D's as any other, in general more. However, if most strategies were of this sort, they would all get low scores when matched against each other, because they would produce many D D outcomes, in which the payoffs are low. So it seems advisable to design a strategy that somehow would entice the co-player to play C and then take advantage of it by playing D. The complexities of many of the strategies submitted to this contest could probably be traced to attempts to design stratagems (strategies based on clever ploys).

The strategy that won the contest was the simplest of those submitted. It started with C and thereafter imitated the co-player: whenever the co-player played C, it played C on the next play; otherwise D. In the round robin tournament in which every submitted strategy was matched with every other (including itself), this strategy, called TIT FOR TAT, got the highest total score.

These results were published along with all the programmes submitted. Invitations were issued to another contest under essentially the same conditions. This time 63 programmes were submitted from six different countries. TIT FOR TAT was submitted again (by the same person) and again got the highest score.

That an exceedingly simple strategy got a higher score than all the sophisticated ones may have seemed surprising. But the really surprising result was that TIT FOR TAT did not beat a single strategy with which it was paired. It either got the same score or a lower score. How, then, could it win the contest? The answer is clear. Recall that all the strategies had to play against all. The "clever" ones designed to beat other strategies may have beaten them, but they, in turn, were also beaten by others. In this way, they reduced each other's scores. TIT FOR TAT cannot get more D's than any strategy it is paired with (since it starts with C and plays D only after the co-player has played D). But it cannot lose by more than one play. The "clever" strategies can lose more when matched with equally "clever" ones including themselves. (TIT FOR TAT matched with itself plays 100%C; so that both it and its "alter ego" get a high score.)

The lesson drawn from this experiment sounds like a paradox: in weakness there is strength. TIT FOR TAT seems "weak" because it can't beat any other strategy. But the "strong" ones beat each other, and it comes out the winner.

The relevance of this lesson to the present international situation should be evident. Conventional wisdom has it that the stronger a country is militarily, the more "secure" it is. This simplistic idea is what drives arms races. If being stronger than B makes A secure, then being stronger than A makes B secure. As a result, each country tries to be stronger and as the other grows stronger, each gets less secure. Today the destructive power of the global arsenal is thousands of times greater than it was forty years ago. Hardly any one would argue that every one is thousands of times more secure than forty years ago.

It is argued that the present dangerous situation has arisen because there is a lack of "trust" among nations, particularly among adversaries. That is correct, as far as it goes. Game models like Prisoner's Dilemma are instructive because they point up precisely why lack of trust leads to outcomes that are bad for both sides. Lack of trust is not the whole story. The main trouble lies in the mistaken belief that "rational" choices are those that seem to be in one's own interest. The notion of "national interest" is based on this idea. As we have seen, the choice of D in Prisoner's Dilemma seems eminently rational, since it leads to a payoff that is larger than the payoff associated with C regardless of how the co-player chooses. Yet the reasoning leads to an outcome that is bad for both.

"We shall require a substantially new manner of thinking," wrote Einstein, "if mankind is to survive."

Prisoner's Dilemma provides a simple but dramatic demonstration of the sort of thinking that must be changed in the nuclear age: the manner of thinking that continues to identify rationality with pursuit of self interest. Entrenched in the world of power politics, this manner of thinking now threatens the human race with extinction.