(This post assumes some knowledge of the decision theory of Newcomb-like scenarios.)

One problem in the decision theory of Newcomb-like scenarios (i.e. the study of whether causal, evidential or some other decision theory is true) is that even the seemingly obvious basics are fiercely debated. Newcomb’s problem seems to be fundamental and the solution obvious (to both sides), and yet scholars disagree about its resolution. If we already fail at the basics, how can we ever settle this debate?

In this post, I propose a solution. Specifically, I will introduce a very plausible general principle that decision rules should abide by. One may argue that settling on powerful general rules (like the one I will propose) must be harder than settling single examples (like Newcomb’s problem). However, this is not universally the case. Especially in decision theory, we should expect general principles to be especially convincing because a common defense of two-boxing in Newcomb’s scenario is that Newcomb’s problem is just a weird edge case in which rationality is punished. By introducing a general principle that CDT (or, perhaps, EDT) violates, we can prove the existence of a *general* flaw.

Without further ado, the principle is: The decisions we make should not depend on the utilities assigned to outcomes that are impossible to occur. To me this principle seems obvious and indeed it is consistent with expected value calculations in non-Newcomb-like scenarios: Imagine having to deterministically choose an action from some set *A*. (We will ignore mixed strategies.) The next state of the world is sampled from a set of states *S* via a distribution P and depends on the chosen action. We are also given a utility function *U*, which assigns values to pairs of a state and an action. Let *a* be an action and let *s* be a possible state. If *P*(*s*,*a*) = 0 (or *P*(*s*|*a*)=0 or *P*(*s* given the causal implications of *a*)=0 – we assume all of these to be the equivalent in this non-Newcomb-like scenario), then it doesn’t matter what *U*(*s*,*a*) is, because in an expected value calculation, *U*(*s*,*a*) will always be multiplied with *P*(*s*,*a*)=0. That is to say, any expected value decision rule gives the same outcome regardless of *U*(*s*,*a*). So, expected value decision rules abide by this principle at least in non-Newcomb-like scenarios.

Let us now apply the principle to a Newcomb-like scenario, specifically to the prisoner’s dilemma played against an exact copy of yourself. Your actions are *C* and *D*. Your opponent is the “environment” and can also choose between *C* (cooperation) and *D* (defection). So, the possible outcomes are (*C*,*C*), (*C*,*D*), (*D*,*C*) and (*D*,*D*). The probabilities P(*C*,*D*) and P(*D*,*C*) are both 0. Applied to this Newcomb-like scenario, the principle of the irrelevance of impossible alternatives states that our decision should only depend on the utilities of (*C*,*C*) and (*D*,*D*). Evidential decision theory behaves in accordance with this principle. (I leave it as an exercise to the reader to verify this.) Indeed, I suspect that it can be shown that EDT generally abides by the principle of the irrelevance of impossible outcomes. The choice of causal decision theory on the other hand *does* depend on the utilities of the impossible outcomes *U*(*D*,*C*) and *U*(*C*,*D*). Remember that in the prisoner’s dilemma the payoffs are such that *U*(*D*,*x*)>*U*(*C*,*x*) for any action *x* of the opponent, i.e. no matter the opponent’s choice it is always better to defect. This dominance is given as the justification for CDT’s decision to defect. But let us say we increase the utility of *U*(*C*,*D*) such that *U*(*C*,*D*)>U(*D*,*D*) and decrease the utility of *U*(*D*,*C*) such that *U*(*D*,*C*)>*U*(*C*,*C*). Of course, we must make these changes for the utility functions of both players so as to retain symmetry. After these changes, the dominance relationship is reversed: *U*(*C*,*x*)>*U*(*D*,*x*) for any action *x.* Of course, the new payoff matrix is not that of a prisoner’s dilemma anymore – the game is different in important ways. But when played against a copy, these differences do not seem significant, because we only changed the utilities of outcomes that were impossible to achieve anyway. Nevertheless, CDT would switch from *D* to *C* upon being presented with these changes, thus violating the principle of the irrelevance of impossible outcomes. This is a *systematic* flaw in CDT: Its decisions depend on the utility of outcomes that it can already know to be impossible.

The principle of the irrelevance of impossible outcomes can be used beyond arguing against CDT. As you may remember from my post on updatelessness, sensible decision theories will precommit to give Omega the money in the counterfactual mugging thought experiment. (If you don’t remember or haven’t read that post in the first place, this is a good time to catch up, because the following thoughts are based on the ideas from the post.) Even EDT, which ignores the utility of impossible outcomes, would self-modify in this way. However, the decision theory resulting from such self-modification violates the principle of the irrelevance of impossible outcomes. Remember that in counterfactual mugging, you give in because this was a good idea to precommit to when you didn’t yet know how the coin came up. However, once you know that the coin came up the unfavorable way, the positive outcome, which gave you the motivation to precommit, has become impossible. Of course, you only give in to counterfactual mugging if the reward in this now impossible branch is sufficiently high. For example, there is no reason to precommit to give in if you lose money in both branches. This means that once you have become updateless, you violate the principle of the irrelevance of impossible outcomes: your decision in counterfactual mugging depends on the utility you assign to an outcome that cannot happen anymore.

” The decisions we make should not depend on the utilities assigned to outcomes that are impossible to occur.” You have to define what you mean by impossible. In game theory there are outcomes that rational players will never reach (so it’s impossible that they will occur) that still effect the outcome of the game.

LikeLiked by 1 person

Thanks for your comment! It’s an excellent point. I would guess that “impossible” can only be defined relative to epistemic algorithms that assign something like probabilities* to all outcomes. Standard game theoretical analysis, as I understand it, usually does not do this. (This is also why it is not suited for the prisoner’s dilemma with a copy.) Once an agent does assign probabilities to what others will do, she doesn’t need game theory anymore, because she can simply maximize expected utility, right?

*There could be epistemic algorithms which only differentiate between possible and impossible outcomes. The irrelevance of impossible outcomes applies to them, too.

LikeLike

“Once an agent does assign probabilities to what others will do, she doesn’t need game theory anymore, because she can simply maximize expected utility, right?” Yes, although this is because to assign the probabilities you have to solve most of the hard game theory.

LikeLiked by 1 person

In this post http://lesswrong.com/lw/f37/naive_tdt_bayes_nets_and_counterfactual_mugging/ I argued, as an aside, for another principle: that it shouldn’t matter whether you were being simulated or whether anyone was simply predicting the result of that simulation.

Then (very roughly) CDT behaves as EDT/UDT if you assume that Newcomb is simulating you, because you can’t tell whether you’re the simulation or the “real” you. But this argument also argues for the counterfactual mugging, unlike yours.

Stuart Armstrong

LikeLiked by 2 people