[This post assumes knowledge of decision theory, as discussed in Eliezer Yudkowsky’s Timeless Decision Theory.]
One interesting feature of some decision theories that I used to be a bit confused about is “updatelessness”. A thought experiment suitable for explaining the concept is counterfactual mugging: “Omega [a being to be assumed a perfect predictor and absolutely trustworthy] appears and says that it has just tossed a fair coin, and given that the coin came up tails, it decided to ask you to give it $100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don’t want to give up your $100. But Omega also tells you that if the coin came up heads instead of tails, it’d give you $10000, but only if you’d agree to give it $100 if the coin came up tails.”
There are various alternatives to this experiment, which seem to illustrate a similar concept, although they are not all structurally isomorphic. For example Gary Drescher discusses Newcomb’s problem with transparent boxes in ch. 6.2 and retribution in ch. 7.3.1 of his book Good and Real. Another relevant example is Parfit’s hitchhiker.
Of course, you win by refusing to pay. To strengthen the intuition that this is the case, imagine that the whole world just consists of one instance of counterfactual mugging and that you already know for certain that the coin came up tails. (We will assume that there is no anthropic uncertainty about whether you are in a simulation used to predict whether you would give in to counterfactual mugging. That is, Omega used some (not necessarily fully reliable) way of figuring out what you’d do. For example, Omega may have created you in a way that implies giving in or not giving in to counterfactual mugging.) Instead of giving money, let’s say thousands of people will be burnt alive if you give in while millions could have been saved if the coin had come up heads. Nothing else will be different as a result of that action. I don’t think there is any dispute over what choices maximizes expected utility for this agent.
The cause of dispute is that agents who give in to counterfactual mugging win in terms of expected value as judged from before learning the result of the coin toss. That is, prior to being told that the coin came up tails, an agent better be one that gives in to counterfactual mugging. After all, this will give her 0.5*$10,000 – 0.5*$100 in expectation. So, there is a conflict between what the agent would rationally want her future self to choose and what is rational for her future self to do. (Another example of this is the absent-minded driver.) There is nothing particularly confusing about the existence of problems with such inconsistency.
Because being an “updateless” agent, i.e. one that makes the choice based on how it would have wanted the choice to be prior to updating, is better for future instances of mugging, sensible decision theories would self-modify into being updateless with regard to all future information they receive. (Note that being updatelessness doesn’t mean that one doesn’t change one’s behavior based on new information, but that one goes through with the plans that one would have committed oneself to pursue before learning that information.) That is, an agent using a decision theory like (non-naive) evidential decision theory (EDT) would commit to giving in to counterfactual mugging and similar decision problems prior to learning that it ended up in the “losing branch”. However, if the EDT agent already knows that it is in the losing branch of counterfactual mugging and hasn’t thought about updatelessness, yet, it wouldn’t give in, although it might (if it is smart enough) self-modify into being updateless in the future.
One immediate consequence of the fact that updateless agents are better off is that one would want to program an AI to be updateless from the start. I guess it is this sense in which people like the researchers of the Machine Intelligence Research Institute consider updatelessness to be correct despite the fact that it doesn’t maximize expected utility in counterfactual mugging.
But maybe updateless is not even needed explicitly if the decision theory can take over epistemics. Consider the EDT agent, to whom Omega explains counterfactual mugging. For simplicity’s sake, let us assume that Omega explains counterfactual mugging and only then states which way the coin came up. After the explanation, the EDT agent could precommit, but let’s assume it can’t do so. Now, Omega opens her mouth to tell the EDT agent how the coin came up. Usually, decision theories are not connected to epistemics, so upon Omega uttering the words “the coin came up heads/tails”, Bayesian updating would run its due course. And that’s the problem, since after Bayesian updating the agent will be tempted to reject giving in, which is bad from the point of view of before learning which way the coin came up. To gain good evidence about Omega’s prediction of oneself, EDT may update in a different way to ensure that it would receive the money if the coin came up heads. For example, it could update towards the existence of both branches (which is basically equivalent to the updateless view of continuing to maintain the original position). Of course, self-modifying or just using some decision theory that has updatelessness built in is the much cleaner way to go.
Overall, this suggests a slightly different view of updatelessness. Updatelessness is not necessarily a property of decision theories. It is the natural thing to happen when you apply acausal decision theory to updating based on new information.
Acknowledgment: This work was funded by the Foundational Research Institute (now the Center on Long-Term Risk).