June 2025 – The Universe from an Intentional Stance

Let’s say we want to identify effective strategies for multi-agent games like the Iterated Prisoner’s Dilemma or more complex environments (like the kind of environments in Melting Pot). Then tournaments are a natural approach: let people submit strategies, and then play all these strategies against each other in a round-robin tournament.

There are lots of examples of such tournaments: Most famously, Axelrod ran two competitions for the iterated Prisoner’s Dilemma, identifying tit for tat as a good strategy. Less famously, there’s Alex Mennen’s open-source Prisoner’s Dilemma tournament, and my own little “open-prompt” Prisoner’s Dilemma tournament.

But there’s a question of what to do with these results and in particular how to reward participants for their strategies’ success. The most principled approach would be to reward participants in direct proportion to the utility they obtain. If your strategy earns 100 units of utility across interactions with all opponents, you get $100.

Unfortunately, this principled scheme is not very practical.¹ Instead, I think tournaments will usually give ranking-based rewards. Alex Mennen and I gave a monetary reward to the top-ranked strategy in our respective tournaments. Axelrod didn’t hand out monetary rewards, but dedicated quite a lot of attention in his book to the top-ranking strategy (or strategies) (crediting Anatol Rapoport for submitting it).

Unfortunately, ranking-based rewards come with a number of problems. One (known) problem is that if you only reward the top (say, top three) programs, high-risk strategies might become more attractive than their expected utility would suggest. (It might be better to get, say, a utility of 100 for 1st place with 10% probability and a utility of 0 for bottom place with 90% probability, than to reliably get a reward of 90 for a ranking between 10 and 20.)²

I here want to focus on a different issue with ranking-based rewards: they create perverse incentives for spiteful or over-competitive behavior. Another’s failure might be your gain, if it makes the other participant fall below you in the rankings. For example, imagine we’re running an Iterated Prisoner’s Dilemma tournament with the following twist. Let’s say that occasionally players get an opportunity to inflict at no cost to themselves a large negative payoff (say, -100) on their opponent in a way that’s undetectable to the opponent (and thus unpunishable). In a normal utility-based framework, there’s no reason to take this action. However, under ranking-based rewards, you would always want to take this spiteful action. Decreasing your opponent’s total utility across the tournament might lower their ranking below yours and thus improve your position in the final standings. Perhaps even more troublingly, for small enough epsilon, you’d be willing to sacrifice epsilon units of your own reward just to inflict this large negative payoff on your opponent.³

There is a relatively simple solution to this spite problem, however: Split the participant pool in two, e.g., at random. Call them Group A and Group B. Instead of having every strategy play against every other strategy (round-robin), have strategies from Group A play only against all the strategies from Group B. Then, rank participants only within Group A. Do the analogous with Group B. So in the end you obtain two rankings: a ranking of Group A in terms of performance (in terms of utility obtained) against Group B; and a ranking of Group B against Group A.

This approach eliminates the spite incentive entirely. Since you’re not ranked against anyone that you play against, you have no intrinsic interest in your opponent’s utilities.

Further considerations: Of course, there’s a cost to this: if we hold the set of submitted strategies fixed (and thereby ignore the problematic incentives of traditional round-robin), then in some sense the two-groups-tournament is less informative than the single-group round-robin tournament. This is because we evaluate each strategy against a smaller sample of opponent strategies. For instance, if we have 40 participants, then normally we’d get an estimate of each participant’s expected utility based on 39 samples of opponent strategies. With the two-group split, we only get an estimate based on 19 samples (from the same distribution) of opponent strategies. So we get worse at deciding whether the submitted strategies are good/bad. (As the number of submission increases, the downside of halving the number of samples decreases, but the motivating risk of round-robin tournaments is also more pronounced if the number of submissions is small.)

A different possible concern is that outside observers will not abide by the “ranking only within group” rule. If all of the submitted strategies are published, then an observer could simply run a round-robin tournament on the submitted strategies and publish an alternate ranking. This alternate ranking might then confer prestige, and to optimize their position in the alternate ranking, participants are once again incentivized to submit spiteful strategies. To prevent this, one would have to make it difficult enough to reproduce the tournament. (In many cases, one might justifiably hope that nobody will bother publicizing an alternate ranking, though.)

Even if just the overall utility obtained is published, then a third-party observer might simply take the highest-scoring strategy across the two groups to be the “real winner”. For small numbers of participants, the two winners are incomparable, because they faced different sets of opponents. But the distribution over opponents is the same. So as the participant pool grows, the comparison becomes valid. If outside parties naively aggregate across groups and the resulting comparisons confer prestige, participants are once more incentivized to submit spiteful programs to fare better in the aggregated rankings.

To combat this, the tournament organizers could try to publish more limited information about the different strategies’ scores.⁴

Related ideas: The spite incentive problem and the population split idea follow a common pattern in mechanism design: In many cases, your first idea for a mechanism will set some bad incentives on participants, because of some specific downstream effect of the participants’ decisions. To fix the mechanism, you can remove that downstream effect, usually at some cost.

For instance, this is how you might get from a first- to a second-price auction:

The most natural idea for an auction is to just ask participants (bidders / people interested in the item you’re looking to sell) what they’d be willing to pay for the item and sell to the highest bidder. How much do you then charge the highest bidder? Well, however much they’re willing to pay, of course! Or perhaps, however much they’re willing to pay minus $1 (since otherwise they wouldn’t gain anything from participating).
But now you realize that participants aren’t necessarily incentivized to honestly report how much they’re willing to pay: by reporting a lower number they might get away with paying less for your item.
Let’s say that you want bidders to report their willingness to pay honestly (e.g., because you want to make sure to sell to whoever wants it most). Then to fix the incentive issue, you need to determine the price paid by the winner in a way that doesn’t depend on the winner’s bid, while still making sure that the price is at most equal to the winner’s bid. One way of doing this is to charge them the second-highest bid. (It could also be the third-highest bid or the average of all the other bids, but, of course, these numbers are all lower than the second highest bid and so result in less profit. If in addition to the participants’ bids, you get a signal from a third party, telling you minimum distance between bids is $20, then you could also charge the second bid plus $20.) Of course, the cost is that the price we charge is not informed by the winning participant’s willingness to pay.

The population splitting idea can be discovered by an analogous line of thinking: to fix the spite incentive, you need to remove the influence from your opponent’s utility on your ranking. The simplest way of doing this is to only play against opponents against whom you’re not subsequently ranked.

Of course, the details of the auction case are quite different from the present setting.

The most closely related idea that I’m aware of is about eliminating similar bad incentives in peer review. In peer review, the same people often serve as reviewers and as authors. Conferences often select papers in part based on rankings (rather than just absolute scores) of the papers. In particular, they might choose based on rankings within specific areas. Therefore, if you serve as a reviewer on a paper in your area, you might be incentivized to give the paper a lower score in order to improve the ranking of your paper. The obvious idea to avoid this problem is to make sure that a reviewer isn’t assigned to a paper that they’re competing against. For a brief introduction to this problem and solution, along with some references, see Shah (2022, Section “Dishonest Behavior”, Subsection “Lone Wolf”). I think the fundamental idea here is the same as the idea behind the above partition-based approach to cooperative AI competitions. Some of the details importantly differ, of course. For instance, each reviewer can only review a small number of papers anyway (whereas in the cooperative AI competition one can evaluate each program against all other programs).

Footnotes:

In many cases, the rewards aren’t primarily monetary. Rather, a high ranking is rewarded with attention and prestige, and associated prestige and attention are hard to control at all. But even distributing monetary rewards in proportion to utility is impractical. For example, you now effectively reward people merely for entering the competition at all. Requiring a submission fee to counter this has its own issues (such as presumably becoming subject to gambling regulations). ↩︎
There is some literature on this phenomenon in the context of forecasting competitions. That is, in a forecasting competitions with ranking-based reward, forecasters typically aren’t incentivized to submit predictions honestly, because they want to increase (and sometimes decrease!) the variance of their score. See, e.g., Lichtendahl and Winkler (2007) and Monroe et al. (2025). ↩︎
How big is the spite issue? Well, for one this depends, of course, on the game that is being played. My guess is that in the simple Prisoner’s Dilemma games that most of the existing tournaments have focused on, these spiteful incentives are relatively insignificant. There just aren’t any opportunities to hurt opponents cheaply. But I’d imagine that in more complex games, all kinds of opportunities might come up, and so in particular there might be opportunities for cheap spite.

Second, the importance of spite depends on the number of participants in the tournament. In the extreme case, if there are only two participants, then the interaction between the two is fully zero-sum. As the number of participants grows, the benefits of harming a randomly selected opponent decrease, as a randomly selected opponent is unlikely to be a relevant rival.

The exact dynamics are complicated, however. For instance, if there are only two participants with a realistic chance at first place (perhaps due to technical implementation challenges), then encounters between these top contenders become effectively zero-sum from each player’s perspective – if they can “recognize” each other. ↩︎
A natural first idea is to only publish relative scores for each group. For instance, in each group separately, subtract from each strategy’s utility the utility of the top strategy. This way the published top score in each group will be 0. This will work against a naive third party. But a more sophisticated third party might be able to infer the original scores if there are pairs of very similar strategies between the groups. For instance, in the Iterated Prisoner’s Dilemma, we might imagine that both groups will contain a DefectBot, i.e., a strategy of defecting all the time, regardless of the opponent’s actions. So in some sense this approach obscures too little information. For better protection, one might have to obscure information even more. The question of how to do this is similar to the questions studied in the Differential Privacy framework.

One radical approach would be to only publish the ranking and to never publish numeric scores. Presumably this makes it impossible to compare across groups. This has other downsides, though. If, say, our top-three strategies achieve similar scores, they deserve similar amounts of attention – they are similarly promising approaches to cooperation. If we publish scores, outside observers will be able to tell that this is the case. If we publish just the ranking, observers won’t know whether, say, the second-best strategy is competitive with the best or not. So we’re making it more difficult for outside observers to draw conclusions about the relative promise of different research directions. We might also worry that publishing only the ranking will worsen the incentives toward high-variance strategies, because the reward for coming in at a close second place is lower. ↩︎

	Jesse Clifton on Decision Theory and the Irrele…
	Lukas Finnveden on Cooperative AI competitions wi…
	Caspar on Cooperative AI competitions wi…
	Lukas Finnveden on Cooperative AI competitions wi…
	Lukas Finnveden on Cooperative AI competitions wi…

The Universe from an Intentional Stance

Month: June 2025

Cooperative AI competitions without spiteful incentives