(This post assumes some knowledge of the decision theory of Newcomb-like scenarios.)
One problem in the decision theory of Newcomb-like scenarios (i.e. the study of whether causal, evidential or some other decision theory is true) is that even the seemingly obvious basics are fiercely debated. Newcomb’s problem seems to be fundamental and the solution obvious (to both sides), and yet scholars disagree about its resolution. If we already fail at the basics, how can we ever settle this debate?
In this post, I propose a solution. Specifically, I will introduce a very plausible general principle that decision rules should abide by. One may argue that settling on powerful general rules (like the one I will propose) must be harder than settling single examples (like Newcomb’s problem). However, this is not universally the case. Especially in decision theory, we should expect general principles to be especially convincing because a common defense of two-boxing in Newcomb’s scenario is that Newcomb’s problem is just a weird edge case in which rationality is punished. By introducing a general principle that CDT (or, perhaps, EDT) violates, we can prove the existence of a general flaw.
Without further ado, the principle is: The decisions we make should not depend on the utilities assigned to outcomes that are impossible to occur. To me this principle seems obvious and indeed it is consistent with expected value calculations in non-Newcomb-like scenarios: Imagine having to deterministically choose an action from some set A. (We will ignore mixed strategies.) The next state of the world is sampled from a set of states S via a distribution P and depends on the chosen action. We are also given a utility function U, which assigns values to pairs of a state and an action. Let a be an action and let s be a possible state. If P(s,a) = 0 (or P(s|a)=0 or P(s given the causal implications of a)=0 – we assume all of these to be the equivalent in this non-Newcomb-like scenario), then it doesn’t matter what U(s,a) is, because in an expected value calculation, U(s,a) will always be multiplied with P(s,a)=0. That is to say, any expected value decision rule gives the same outcome regardless of U(s,a). So, expected value decision rules abide by this principle at least in non-Newcomb-like scenarios.
Let us now apply the principle to a Newcomb-like scenario, specifically to the prisoner’s dilemma played against an exact copy of yourself. Your actions are C and D. Your opponent is the “environment” and can also choose between C (cooperation) and D (defection). So, the possible outcomes are (C,C), (C,D), (D,C) and (D,D). The probabilities P(C,D) and P(D,C) are both 0. Applied to this Newcomb-like scenario, the principle of the irrelevance of impossible alternatives states that our decision should only depend on the utilities of (C,C) and (D,D). Evidential decision theory behaves in accordance with this principle. (I leave it as an exercise to the reader to verify this.) Indeed, I suspect that it can be shown that EDT generally abides by the principle of the irrelevance of impossible outcomes. The choice of causal decision theory on the other hand does depend on the utilities of the impossible outcomes U(D,C) and U(C,D). Remember that in the prisoner’s dilemma the payoffs are such that U(D,x)>U(C,x) for any action x of the opponent, i.e. no matter the opponent’s choice it is always better to defect. This dominance is given as the justification for CDT’s decision to defect. But let us say we increase the utility of U(C,D) such that U(C,D)>U(D,D) and decrease the utility of U(D,C) such that U(D,C)>U(C,C). Of course, we must make these changes for the utility functions of both players so as to retain symmetry. After these changes, the dominance relationship is reversed: U(C,x)>U(D,x) for any action x. Of course, the new payoff matrix is not that of a prisoner’s dilemma anymore – the game is different in important ways. But when played against a copy, these differences do not seem significant, because we only changed the utilities of outcomes that were impossible to achieve anyway. Nevertheless, CDT would switch from D to C upon being presented with these changes, thus violating the principle of the irrelevance of impossible outcomes. This is a systematic flaw in CDT: Its decisions depend on the utility of outcomes that it can already know to be impossible.
The principle of the irrelevance of impossible outcomes can be used beyond arguing against CDT. As you may remember from my post on updatelessness, sensible decision theories will precommit to give Omega the money in the counterfactual mugging thought experiment. (If you don’t remember or haven’t read that post in the first place, this is a good time to catch up, because the following thoughts are based on the ideas from the post.) Even EDT, which ignores the utility of impossible outcomes, would self-modify in this way. However, the decision theory resulting from such self-modification violates the principle of the irrelevance of impossible outcomes. Remember that in counterfactual mugging, you give in because this was a good idea to precommit to when you didn’t yet know how the coin came up. However, once you know that the coin came up the unfavorable way, the positive outcome, which gave you the motivation to precommit, has become impossible. Of course, you only give in to counterfactual mugging if the reward in this now impossible branch is sufficiently high. For example, there is no reason to precommit to give in if you lose money in both branches. This means that once you have become updateless, you violate the principle of the irrelevance of impossible outcomes: your decision in counterfactual mugging depends on the utility you assign to an outcome that cannot happen anymore.
Acknowledgment: This work was funded by the Foundational Research Institute (now the Center on Long-Term Risk).