An information-theoretic variant of the adversarial offer to address randomization

[I assume the reader is familiar with Newcomb’s problem and causal decision theory. Some familiarity with basic theoretical computer science ideas also helps.]

The Adversarial Offer

In a recently (Open-Access-)published paper, I (together with my PhD advisor, Vince Conitzer) proposed the following Newcomb-like scenario as an argument against causal decision theory:

> Adversarial Offer: Two boxes, B1 and B2, are on offer. A (risk-neutral) buyer may purchase one or none of the boxes but not both. Each of the two boxes costs $1. Yesterday, the seller put $3 in each box that she predicted the buyer would not acquire. Both the seller and the buyer believe the seller’s prediction to be accurate with probability 0.75.

If the buyer buys one of the boxes, then the seller makes a profit in expectation of $1 – 0.25 * $3 = $0.25. Nonetheless, causal decision theory recommends buying a box. This is because at least one of the two boxes must contain $3, so that the average box contains at least $1.50. It follows that the causal decision theorist must assign an expected causal utility of at least $1.50 to (at least) one of the boxes. Since $1.50 exceeds the cost of $1, causal decision theory recommends buying one of the boxes. This seems undesirable. So we should reject causal decision theory.

The randomization response

One of the obvious responses to the Adversarial Offer is that the agent might randomize. In the paper, we discuss this topic at length in Section IV.1 and in the subsection on ratificationism in Section IV.4. If you haven’t thought much about randomization in Newcomb-like problems before, it probably makes sense to first check out the paper and only then continue reading here, since the paper makes more straightforward points.

The information-theoretic variant

I now give a new variant of the Adversarial Offer, which deals with the randomization objection in a novel and very interesting way. Specifically, the unique feature of this variant is that CDT correctly assesses randomizing to be a bad idea. Unfortunately, it is quite a bit more complicated than the Adversarial Offer from the paper.

Imagine that the buyer is some computer program that has access to a true random number generator (TRNG). Imagine also that the buyer’s source code plus all its data (memories) has a size of, say, 1GB and that the seller knows that it has (at most) this size. If the buyer wants to buy a box, then she will have to pay $1 as usual, but instead of submitting a single choice between buying box 1 and buying box 2, she has to submit 1TB worth of choices. That is, she has to submit a sequence of 2^43 (=8796093022208) bits, each encoding a choice between the boxes.

If the buyer buys and thus submits some such string of bits w, the seller will do the following. First, the seller determines whether there exists any deterministic 1GB program that outputs w. (This is undecidable. We can fix this if the seller knows the buyer’s computational constraints. For example, if the seller knows that the buyer can do less than 1 exaFLOP worth of computation, then the seller could instead determine only whether there is a 1GB program that produces w with at most 1 exaFLOP worth of computation. This is decidable (albeit not very tractable).) If there is no such program, then the seller knows that the buyer randomized. The buyer then receives no box while the seller keeps the $1 payment. The buyer is told in advance that this is what the seller will do. Note that for this step, the seller doesn’t use her knowledge of the buyer’s source code other than its length (and its computational constraints).

If there is at least one 1GB program that outputs w deterministically, then the seller forgets w again. She then picks an index i of w at random. She predicts w[i] based on what she knows about the buyer and based on w[1…i-1], i.e., on all the bits of w preceding i. Call the prediction w’[i]. She fills the box based on her prediction w’[i] and the buyer receives (in return for the $1) the box specified by w[i].

Why the information-theoretic variant is interesting

The scenario is interesting because of the following three facts (which I will later argue to hold):

  1. The seller makes a profit off agents who try to buy boxes, regardless of whether they do so using randomization or not.
  2. CDT and related theories (such as ratificationism) assess randomizing to be worse than not buying any box.
  3. CDT will recommend buying a box (mostly deterministically).

I’m not aware of any other scenarios with these properties. Specifically, the novelty is item 2. (Our paper offers a scenario that has the other two properties.) The complications of this scenario – letting the agent submit a TB worth of choices to then determine whether they are random – are all introduced to achieve item 2 (while preserving the other items).

In the following, I want to argue in more detail for these three points and for the claim that the scenario can be set up at all (which I will do under claim 1).

1. For this part, we need to show two things:

A) Intuitively, if the agent submits a string of bits that uses substantially more than 1GB worth of randomness, then he is extremely likely to receive no box at all.

B) Intuitively, if the agent uses only about 1GB or less worth randomness, then the seller – using the buyer’s source code will likely be able to predict w[i] with high accuracy based on w[1…i-1].

I don’t want to argue too rigorously for either of these, but below I’ll give intuitions and some sketches of the information-theoretic arguments that one would need to give to make them more rigorous.

A) The very simple point here is that if you create, say, a 2GB bitstring w where each bit is determined by a fair coin flip, then it is very unlikely that there exists a program that deterministically outputs w. After all, there are many more ways to fill 2GB of bits than there are 1GB programs (about 2^(2^33) as many). From this one may be tempted to conclude that if the agent determines, say, 2GB of the TB of choices by flipping coin, he is likely to receive no box. But this argument is incomplete, because there are other ways to use coin flips. For example, the buyer might use the following policy: Flip 2GB worth of coins. If they all come up heads, always take box B. Otherwise follow some given deterministic procedure.

To make the argument rigorous, I think we need to state the claim information-theoretically. But even this is a bit tricky. For example, it is not a problem per se for w to have high entropy if most of the entropy comes from a small part of the distribution. (For example, if with probability 0.01 the seller randomizes all choices, and with the remaining probability always chooses Box 1, then the entropy of w is 0.01*1TB = 10GB (??), but the buyer is still likely to receive a box.) So I think we’d need to make a more complicated claim, of the sort: if there is no substantial part (say, >p) of the distribution over w that has less than, say, 2GB of entropy, then with high probability (>1-p), the agent will receive no box. 

B) Again, we can make a simple but incomplete argument: If of the 1TB of choices, only, say, 2GB are determined by random coin flips, then a randomly sampled bit is likely to be predictable from the agent’s source code. But again, the problem is that the random coin flips can be used in other ways. For example, the buyer might use a deterministic procedure to determine w (say, w=01010101…), but then randomly generate a number n (with any number n chosen with probability 2^-n, for instance), then randomly sample n indices j of w and flip w[j] for each of them. This may have relatively low entropy. But now the seller cannot perfectly predict w[i] given w[1…i-1] for any i.

Again, I think a rigorous argument requires information theory. In particular, we can use the fact that H(w) = H(w[0])+H(w[1]|w[0])+H(w[2]|w[0,1])+…, where H denotes entropy. If H(w) is less than, say, 2GB, then the average of H(w[i]|w[1…i-1]) must be at most 2GB/1TB = 1/500. From this, it follows immediately that with high probability, w[i] can be predicted with high accuracy given w[1…i-1].

2. This is essential, but straightforward: Generating w at random causes the seller to determine that w is determined at random. Therefore, CDT (accurately) assesses randomly generating w to have an expected utility near $-1.

3. Finally, I want to argue that CDT will recommend buying a box. For this, we only need to argue that CDT prefers some method of submitting w over not submitting a box. So consider the following procedure: First, assign beliefs over the seller’s prediction w’[0] of the first bit. Since there are only two possible boxes, for at least one of the boxes j, it is the case that P(w’[0]=j)<=½, where P refers to the probability assigned by the buyer. Let w[0] = j. We now repeat this inductively. That is, for each i given the w[1…i-1] that we have already constructed, the buyer sets w[i]=k s.t. P(w’[i]=k||w[1…i-1])<=½.

What’s the causal expected utility of submitting w thus constructed? Well, for one, because the procedure is deterministic (if ties are broken deterministically), the buyer can expect that she will receive a box at all. Now, for all i, the buyer thinks that if i is sampled by the seller for the purpose of determining which box to give to the buyer, then the buyer will in causal expectation receive $1.50, because the seller will predict the wrong box, i.e. w’[i] ≠ w[i], with probability at least ½.

UDT is “updateless” about its utility function

Updateless decision theory (UDT) (or some variant thereof) seems to be widely accepted as the best current solution to decision theory by MIRI researchers and LessWrong users. In this short post, I outline one potential implication of being completely updateless. My intention is not to refute UDT, but to show that:

  1. It is not clear how updateless one might want to be, as this could have unforeseen consequences.
  2. If one endorses UDT, one should also endorse superrational cooperation on a very deep level.

My argument is simple, and draws on the idea of multiverse-wide superrational cooperation (MSR), which is a form of acausal trade between agents with correlated decision algorithms. Thinking about MSR instead of general acausal trade has the advantage that it seems conceptually easier, while the conclusions gained should hold in the general case as well. Nevertheless, I am very uncertain and expect the reality of acausal cooperation between AIs to look different from the picture I draw in this post.

Suppose humans have created a friendly AI with a CEV utility function and UDT as its decision theory. This version of UDT has solved the problem of logical counterfactuals and algorithmic correlation, and can readily spot any correlated agent in the world. Such an AI will be inclined to trade acausally with other agents—agents in parts of the world it does not have causal access to. This is, for instance, to achieve gains from comparative advantages given empirical circumstances, and to exploit diminishing marginal returns of pursuing any single value system at once.

For the trade implied by MSR, the AI does not have to simulate other agents and engage in some kind of Löbian bargain with them. Instead, the AI has to find out whether the agents’ decision algorithms are functionally equivalent to the AI’s decision algorithm, it has to find out about the agents’ utility functions, and it has to make sure the agents are in an empirical situation such that trade benefits both parties in expectation. (Of course, to do this, the AI might also have to perform a simulation.) The easiest trading step seems to be the one with all other agents using updateless decision theory and the same prior. In this context, it is possible to neglect many of the usual obstacles to acausal trade. These agents share everything except their utility function, so there will be little if any “friction”—as long as the compromise takes differences between utility functions into account, the correlation between the agents will be perfect. It would get more complicated if the versions of UDT diverged a bit, and if the priors were slightly different. (More on this later.) I assume here that the agents can find out about the other agents’ utility functions. Although these are logically determined by the prior, the agents might be logically uncertain, and calculating the distribution of utility functions of UDT agents might be computationally expensive. I will ignore this consideration here.

A possible approach to this trade is to effectively choose policies based on a weighted sum of the utility functions of all UDT agents in all the possible worlds contained in the AI’s prior (see Oesterheld 2017, section 2.8 for further details). Here, the weights will be assigned such that in expectation, all agents will have an incentive to pursue this sum of utility functions. It is not exactly clear how such weights will be calculated, but it is likely that all agents will adopt the same weights, and it seems clear that once this weighting is done based on the prior, it won’t change after finding out which of the possible worlds from the prior is actual (Oesterheld 2017, section 2.8.6). If all agents adopt the policy of always pursuing a sum of their utility functions, the expected marginal additional goal fulfillment for all AIs at any point in the future will be highest. The agents will act according to the “greatest good for the greatest number.” Any individual agent won’t know whether they will benefit in reality, but that is irrelevant from the updateless perspective. This becomes clear if we compare the situation to thought experiments like the Counterfactual Mugging. Even if in the actual world, the AI cannot benefit from engaging in the compromise, then it was still worth it from the prior viewpoint, since (given sufficient weight in the sum of utility functions) the AI would have stood to gain even more in another, non-actual world.

If the agents are also logically updatelessness, this reduces the information the weights of the agents’ utility functions are based on. There probably are many logical implications that could be drawn from an empirical prior and the utility functions about aspects of the trade—e.g., that the trade will benefit only the most common utility functions, that some values won’t be pursued by anyone in practice, etc.—that might be one logical implication step away from a logical prior. If the AI is logically updateless, it will always perform the action that it would have committed to before it got to know about these implications. Of course, logical updatelessness is an unresolved issue, and its implications for MSR will depend on possible solutions to the problem.

In conclusion, in order to implement the MSR compromise, the AI will start looking for other UDT agents in all possible (and, possibly, impossible) worlds in its prior. It will find out about their utility functions and calculate a weighted sum over all of them. This is what I mean by the statement that UDT is “updateless” about its utility function: no matter what utility function it starts out with, its own function might still have negligible weight in the goals the UDT AI will pursue in practice. At this point, it becomes clear that it really matters what this prior looks like. What is the distribution of the utility functions of all UDT agents given the universal prior? There might be worlds less complex than the world humans live in—for instance, a cellular automaton, such as Rule 110 or Game of Life, with a relatively simple initial state—which still contain UDT agents. Given that these worlds might have a higher prior probability than the human world, they might get a higher weight in the compromise utility function. The AI might end up maximizing the goal functions of the agents in the simplest worlds.

Is updating on your existence a sin?

One of the features of UDT is that it does not even condition the prior on the agent’s own existence—when evaluating policies, UDT also considers their implications in worlds that do not contain an instantiation of the agent, even though by the time the agent thinks its first thought, it can be sure that these worlds do not exist. This might not be a problem if one assigns high weight to a modal realism/Tegmark Level 4 universe anyway. An observation can never distinguish between a world in which all worlds exist, and one in which only the world featuring the current observation exists. So if the measure of all the “single worlds” is small, then updating on existence won’t change much.

Suppose that this is not the case. Then there might be many worlds that can already be excluded as non-actual based on the fact that they don’t contain humans. Nevertheless, they might contain UDT agents with alien goals. This poses a difficult choice: Given UDT’s prior, the AI will still cooperate with agents living in non-actual (and impossible, if the AI is logically updatelessness) worlds. This is because given UDT’s prior, it could have been not humans, but these alien agents, that turned out actual—in which case they could have benefited humans in return. On the other hand, if the AI is allowed to condition on such information, then it loses in a kind of counterfactual prisoner’s dilemma:

  • Counterfactual prisoner’s dilemma: Omega has gained total control over one universe. In the pursuit of philosophy, Omega flips a fair coin to determine which of two agents she should create. If the coin comes up heads, Omega will create a paperclip maximizer. If it comes up tails, she creates a perfectly identical agent, but with one difference: the agent is a staple maximizer. After the creation of these agents, Omega hands either of them total control over the universe and lets them know about this procedure. There are gains from trade: producing both paperclips and staples creates 60% utility for both of the agents, while producing only one of those creates 100% for one of the agents. Hence, both agents would (in expectation) benefit from a joint precommitment to a compromise utility function, even if only one of the agents is actually created. What should the created agent do?

If the agents condition on their existence, then they will not gain as much in expectation as they could otherwise expect to gain before the coin flip (when neither of the agents existed). I have chosen this thought experiment because it is not confounded by the involvement of simulated agents, a factor which could lead to anthropic uncertainty and hence make the agents more updateless than they would otherwise be.

UDT agents with differing priors

What about UDT agents using differing priors? For simplicity, I suppose there are only two agents. I also assume that both agents have equal capacity to create utilons in their universes. (If this is not the case, the weights in the resulting compromise utility function have to be adjusted.) Suppose both agents start out with the same prior, but update it on their own existence—i.e., they both exclude any worlds that don’t contain an instantiation of themselves. This posterior is then used to select policies. Agent B can’t benefit from any cooperative actions by agent A in a world that only exists in agent A’s posterior. Conversely, agent A also can’t benefit from agent B in worlds that agent A doesn’t think could be actual anymore. So the UDT policy will recommend pursuing a compromise function only in worlds lying in the intersection of worlds that exist in both agent’s posteriors. If either agent updates that they are in some of the worlds to which the other agent assigns approximately zero probability, then they won’t cooperate.

More generally, if both agents know which world is actual, and this is a world which they both inhabit, then it doesn’t matter which prior they used to select their policies. (Of course, this world must have nonzero probability in both of their priors; otherwise they wouldn’t ever update that said world is actual.) From the prior perspective, for agent A, every sacrificed utilon in this world is weighted by its prior measure of the world. Every gained utilon from agent B is also weighted by the same prior measure. So there is no friction in this compromise—if both agents decide between action a which gives themselves d utilons, and an action b which gives the other agent c utilons, then any agent will prefer option b iff c divided by this agent’s prior measure of the world is greater than d divided by the same prior measure, so iff c is greater than d. Given that there is a way to normalize both agents’ utility functions, pursuing a sum of those utility functions seems optimal.

We can even expand this to the case wherein the two agents have any differing priors with a nonempty intersection between the corresponding sets of possible worlds. In expectation, the policy that says: “if any world outside the intersection is actual: don’t compromise; if any world from the intersection is actual: do the standard UDT compromise, but use the posterior distribution in which all worlds outside the intersection have zero probability for policy selection” seems best. When evaluating this policy, both agents can weight both utilons sacrificed for others, as well as utilons gained from others, in any of the worlds from the intersection by the measure of the entire intersection in their own respective priors. This again creates a symmetrical situation with a 1:1 trade ratio between utilons sacrificed and gained.

Another case to consider is if the agents also distribute the relative weights between the worlds in the intersection differently. I think that this does not lead to asymmetries (in the sense that conditional on some of the worlds being actual, one agent stands to gain and lose more than the other agent). Suppose agent A has 30% on world S1, and 20% on World S2. Agent B, on the other hand, has 10% on world S1 and 20% on world S2. If both agents follow the policy of pursuing the sum of utility functions, given that they find themselves in either of the two shared worlds, then, ceteris paribus, both will in expectation benefit to an equal degree. For instance, let c1 (c2) be the amount of utilons either agent can create for the other agent in world S1 (S2), and d1 (d2) the respective amount agents can create for themselves. Then agent A gets either 0.3×c1+0.2×c2 or 0.3×d1+0.2×d2, while B chooses between 0.1×c1+0.2×c2 and 0.1×d1+0.2×d2. Here, it’s not the case that A prefers cooperating iff B prefers cooperating. But assuming that in expectation, c1 = c2 as well as d1 = d2, this leads to a situation where both prefer cooperation iff c1 > d1. It follows that just pursuing a sum of both agents’ utility functions is, in expectation, optimal for both agents.

Lastly, consider a combination of non-identical priors with empirical uncertainty. For UDT, empirical uncertainty between worlds translates into anthropic uncertainty about which of the possible worlds the agent inhabits. In this case, as expected, there is “friction”. For example, suppose agent A assigns p to the intersection of the worlds in both agents’ priors, while agent B assigns p/q. Before they find out whether one of the worlds from the intersection or some other world is actual, the situation is the following: B can benefit from A’s cooperation in only p/q of the worlds. A can benefit in p of the worlds from B, but for everything A does, this will only mean p/q as much to agent B. Now each agent can again either create d utilons for themselves, or perform a cooperative action that gives c utilons to the other agent in the world where the action is performed. Given uncertainty about which world is actual, if both agents choose cooperation, agent A receives c×p utilons in expectation, while agent B receives c×p/q utilons in expectation. Defection gives both agents d utilons. So for cooperation to be worth it, c×p and c×p/q both have to be greater than d. If this is the case, then if p is unequal to p/q, both agents’ gains from trade are still not equal. This appears to be a bargaining problem that doesn’t solve as easily as the examples from above.

Conclusion

I actually endorse the conclusion that humans should cooperate with all correlating agents. Although humans’ decision algorithms might not correlate with as many other agents, and they might not be able to compromise as efficiently as super-human AIs, humans should nevertheless pursue some multiverse-wide sum of values. What I’m uncertain about is how far updatelessness should go. For instance, it is not clear to me which empirical and logical evidence humans should and shouldn’t take into account when selecting policies. If an AI does not start out with the knowledge that humans possess but instead uses the universal prior, then it might perform actions that seem irrational given human knowledge. Even if observations are logically inconsistent with the existence of a fellow cooperation partner (i.e., in the updated distribution, the cooperation partner’s world has zero probability), then UDT might still cooperate with and possibly adopt that partner’s values. I doubt at this point whether everyone still agrees with the hypothesis that UDT always achieves the highest utility.

Acknowledgements

I thank Caspar Oesterheld, Max Daniel, Lukas Gloor, and David Althaus for helpful comments on a draft of this post, and Adrian Rorheim for copy editing.

Anthropic uncertainty in the Evidential Blackmail

I’m currently writing a piece on anthropic uncertainty in Newcomb problems. The idea is that whenever someone simulates us to predict our actions, this leads us to have anthropic uncertainty about whether we’re in this simulation or not. (If we knew whether we were in the real world or in the simulation, then the simulation wouldn’t fulfill its purpose anymore.) This kind of reasoning changes quite a lot about the answers that decision theories give in predictive dilemmas. It makes their reasoning “more updateless”, since they reason from a more impartial stance: a stance from which they don’t know their exact position in the thought experiment, yet.

This topic isn’t new, but it hasn’t been discussed in-depth before. As far as I am aware, it has been brought up on LessWrong by gRR and in two blog posts by Stuart Armstrong. Outside LessWrong, there is a post by Scott Aaronson, and one by Andrew Critch. The idea is also mentioned in passing by Neal (2006, p. 13). Are there any other sources and discussions of it that I have overlooked?

In this post, I examine what the assumption that predictions or simulations lead to anthropic uncertainty implies for the Evidential Blackmail (also XOR Blackmail), a problem which is often presented as a counter-example to evidential decision theory (EDT) (Cf. Soares & Fallenstein, 2015, p. 5; Soares & Levinstein, 2017, pp. 3–4). A similar problem has been introduced as “Yankees vs. Red Sox” by Arntzenius (2008), and discussed by Ahmed and Price (2012). I would be very grateful for any kind of feedback on my post.

We could formalize the blackmailer’s procedure in the Evidential Blackmail something like this:

def blackmailer():
    your_action = your_policy(receive_letter)
    if predict_stock() == “retain” and your_action == “pay”:
        return “letter”
    elif predict_stock() == “fall” and your_action == “not pay”:
        return “letter”
    else
        return “no letter”

Let p denote the probability P(retain) with which our stock retains its value a. The blackmailer asks us for an amount of money b, where 0<b<a. The ex ante expected utilities are now:

EU(pay) = P(letter|pay) * (a – b) + P(no letter & retain|pay) * a = p (a – b),

EU(not pay) = P(no letter & retain|not pay) * a = p a.

According to the problem description, P(no letter & retain|pay) is 0, and P(no letter & retain|not pay) is p.1 As long as we don’t know whether a letter has been sent or not (even if it might already be on its way to us), committing to not paying gives us only information about whether the letter has been sent, not about our stock, so we should commit not to pay.

Now for the situation in which we have already received the letter. (All of the following probabilities will be conditioned on “letter”.) We don’t know whether we’re in the simulation or not. But what we do if we’re in the simulation can actually change our probability that we’re in the simulation in the first place. Note that the blackmailer has to simulate us one time in any case, regardless of whether our stock goes down or not. So if we are in the simulation and we receive the letter, P(retain|pay) is still equal to P(retain|not pay): neither paying nor not paying gives us any evidence about whether our stock retains its value or not, conditional on being in the simulation. But if we are in the simulation, we can influence whether the blackmailer sends us a letter in the real world. In the simulation, our action decides over whether we receive the letter in the cases where we keep our money, or whether we receive the letter when we lose.

Let’s begin by calculating EDT’s expected utility of not paying. We will lose all money for certain if we’re in the real world and don’t pay, so we only consider the case where we’re in the simulation:

EU(not pay) = P(sim & retain|not pay) * a.

For both SSA and SIA, if our stock doesn’t go down and we don’t pay up, then we’re certain to be in the simulation: P(sim|retain, not pay) = 1, while we could be either simulated or real if our stock falls: P(sim|fall, not pay) = 1/2. Moreover, P(sim & retain|not pay) = P(retain|sim, not pay) * P(sim) = P(sim|retain, not pay) * P(retain). Under SSA, P(retain) is just p.2 We hence get

EU_SSA(not pay) = P(sim|retain, not pay) * p * a = p a.

Our expected utility for paying is:

EU_SSA(pay) = P(sim & retain|pay) * (a – b) + P(not sim|pay) * (a – b)

= P(sim|retain, pay) * p * (a – b) + P(not sim|pay) * (a – b).

If we pay up and the stock retains its value, there is exactly one of us in the simulation and one of us in the real world, so P(sim|retain, pay) = 1/2, while we’re sure to be in the simulation for the scenario in which our stock falls: P(sim|fall, pay) = 1. Knowing both P(sim & retain|pay) and P(sim & fall|pay), we can calculate P(not sim|pay) = p/2. This gives us

EU_SSA(pay) = 1/2 * p * (a – b) + 1/2 * p * (a – b) = p (a – b).

Great, EDT + SSA seems to calculate exactly the same payoffs as all other decision theories – namely, that by paying the Blackmailer, one just loses the money one pays the blackmailer, but gains nothing.

For SIA probabilities, P(retain|letter) depends on whether we pay or don’t pay. If we pay, then there are (in expectation) 2 p observers in the “retain” world, while there are (1 – p) observers in the “fall” world. So our updated P(retain|letter, pay) should be (2 p)/(1 + p). If we don’t pay, it’s p/(2 – p) respectively. Using the above probabilities and Bayes’ theorem, we have P(sim|pay) = 1/(1 + p) and P(sim|not pay) = 1/(2 – p). Hence,

EU_SIA(not pay) = P(sim & retain|not pay) * a = (p a)/(2 – p),

and

EU_SIA(pay) = P(sim) * P(retain|sim) * (a – b) + P(not sim) * (a – b)

= (p (a – b))/(1 + p) + (p (a – b))/(1 + p)

= (2 p (a – b))/(1 + p).

It seems like paying the blackmailer would be better here than not paying, if p and b are sufficiently low.

Why doesn’t SIA give the ex ante expected utilities, as SSA does? Up until now I have just assumed correlated decision-making, so that the decisions of the simulated us will also be those of the real-world us (and of course the other way around – that’s how the blackmail works in the first place). The simulated us hence also gets attributed the impact of our real copy. The problem is now that SIA thinks we’re more likely to be in worlds with more observers. So the worlds in which we have additional impact due to correlated decision-making get double-counted. In the world where we pay the blackmailer, there are two observers for p, while there is only one observer for (1 – p). If we don’t pay the blackmailer, there is only one observer for p, and two observers for (1 – p). SIA hence slightly favors paying the blackmailer, to make the p-world more likely.

To remediate the problem of double-counting for EDT + SIA, we could use something along the lines of Stuart Armstrong’s Correlated Decision Principle (CDP). First, we aggregate the “EDT + SIA” expected utilities of all observers. Then, we divide this expected utility by the number of individuals who we are deciding for. For EU_CDP(pay), there is with probability 1 an observer in the simulation, and with probability p one in the real world. To get the aggregated expected utility, we thus have to multiply EU(pay) by (1 + p). Since we have decided for two individuals, we divide this EU by 2 and get EU_CDP(pay) = ((2 p (a – b))/(1 + p)) * 1/2 * (1 + p) = p (a – b).

For EU_CDP(not pay), it gets more complex: the number of individuals any observer is making a decision for is actually just 1 – namely, the observer in the simulation. The observer in the real world doesn’t get his expected utility from his own decision, but from influencing the other observer in the simulation. On the other hand, we multiply EU(not pay) by (2 – p), since there is one observer in the simulation with probability 1, and with probability (1 – p) there is another observer in the real world. Putting this together, we get EU_CDP(not pay) = ((p a)/(2 – p)) * (2 – p) = p a. So EDT + SIA + CDP arrives at the same payoffs as EDT + SSA, although it is admittedly a rather messy and informal approach.

I conclude that, when taking into account anthropic uncertainty, EDT doesn’t give in to the Evidential Blackmail. This is true for SSA and possibly also for SIA + CDP. Fortunately, at least for SSA, we have avoided any kind of anthropic funny-business. Note that this is not some kind of dirty hack: if we grant the premise that simulations have to involve anthropic uncertainty, then per definition of the thought experiment – because there is necessarily a simulation involved in the Evidential Blackmail –, EDT doesn’t actually pay the blackmailer. Of course, this still leaves open the question of whether we have anthropic uncertainty in all problems involving simulations, and hence whether my argument applies to all conceivable versions of the problem. Moreover, there are other anthropic problems, such as the one introduced by Conitzer (2015a), in which EDT + SSA are still exploitable (in absence of a method to “bind themselves”).

Acknowledgement

I wrote this post while working for the Foundational Research Institute, which is now the Center on Long-Term Risk.

1 P(no letter & retain|not pay) = P(no letter|retain, not pay) * P(retain|not pay) = 1 * P(retain|not pay) = P(retain|not pay, letter) * P(letter|not pay) + P(retain|not pay, no letter)* P(no letter|not pay) = p.

2 This becomes apparent if we compare the Evidential Blackmail to Sleeping Beauty. SSA is the “halfer position”, which means that after updating on being an observer (receiving the letter), we should still assign the prior probability p, regardless of how many observers there are in either of the two possible worlds.

3 The result that EDT and SIA lead to actions that are not optimal ex ante is also featured in several publications about anthropic problems, e.g., Arntzenius, 2002; Briggs, 2010; Conitzer, 2015b; Schwarz, 2015.


Ahmed, A., & Price, H. (2012). Arntzenius on “Why ain”cha rich?’. Erkenntnis. An International Journal of Analytic Philosophy, 77(1), 15–30.

Arntzenius, F. (2002). Reflections on Sleeping Beauty. Analysis, 62(1), 53–62.
Arntzenius, F. (2008). No Regrets, or: Edith Piaf Revamps Decision Theory. Erkenntnis. An International Journal of Analytic Philosophy, 68(2), 277–297.

Briggs, R. (2010). Putting a value on Beauty. Oxford Studies in Epistemology, 3, 3–34.

Conitzer, V. (2015a). A devastating example for the Halfer Rule. Philosophical Studies, 172(8), 1985–1992.

Conitzer, V. (2015b). Can rational choice guide us to correct de se beliefs? Synthese, 192(12), 4107–4119.

Neal, R. M. (2006, August 23). Puzzles of Anthropic Reasoning Resolved Using Full Non-indexical Conditioning. arXiv [math.ST]. Retrieved from http://arxiv.org/abs/math/0608592

Schwarz, W. (2015). Lost memories and useless coins: revisiting the absentminded driver. Synthese, 192(9), 3011–3036.

Soares, N., & Fallenstein, B. (2015, July 7). Toward Idealized Decision Theory. arXiv [cs.AI]. Retrieved from http://arxiv.org/abs/1507.01986

Soares, N., & Levinstein, B. (2017). Cheating Death in Damascus. Retrieved from https://intelligence.org/files/DeathInDamascus.pdf

“Betting on the Past” by Arif Ahmed

[This post assumes knowledge of decision theory, as discussed in Eliezer Yudkowsky’s Timeless Decision Theory and in Arbital’s Introduction to Logical Decision Theory.]

I recently discovered an interesting thought experiment, “Betting on the Past” by Cambridge philosopher Arif Ahmed. It can be found in his book Evidence, Decision and Causality, which is an elaborate defense of Evidential Decision Theory (EDT). I believe that Betting on the Past may be used to money-pump non-EDT agents, refuting Causal Decision Theories (CDT), and potentially even ones that use logical conditioning, such as Timeless Decision Theory (TDT) or Updateless Decision Theory (UDT). At the very least, non-EDT decision theories are unlikely to win this bet. Moreover, no conspicuous perfect predicting powers, genetic influences, or manipulations of decision algorithms are required to make Betting on the Past work, and anyone can replicate the game at home. For these reasons, it might make a more compelling case in favor of EDT than the Coin Flip Creation, a problem I recently proposed in an attempt to defend EDT’s answers in medical Newcomb problems. In Ahmed’s thought experiment, Alice faces the following decision problem:

Betting on the Past: In my pocket (says Bob) I have a slip of paper on which is written a proposition P. You must choose between two bets. Bet 1 is a bet on P at 10:1 for a stake of one dollar. Bet 2 is a bet on P at 1:10 for a stake of ten dollars. So your pay-offs are as in [Figure 1]. Before you choose whether to take Bet 1 or Bet 2 I should tell you what P is. It is the proposition that the past state of the world was such as to cause you now to take Bet 2. [Ahmed 2014, p. 120]

Ahmed goes on to specify that Alice could indicate which bet she’ll take by either raising or lowering her hand. One can find a detailed discussion of the thought experiment’s implications, as well as a formal analysis of CDT’s and EDT’s decisions in Ahmed’s book. In the following, I want to outline a few key points.

Would CDT win in this problem? Alice is betting on a past state of the world. She can’t causally influence the past, and she’s uncertain whether the proposition is true or not. In either case, Bet 1 strictly dominates Bet 2: no matter which state the past is in, Bet 1 always yields a higher utility. For these reasons, causal decision theories would take Bet 1. Nevertheless, as soon as Alice comes to a definite decision, she updates on whether the proposition is true or false. If she’s a causal agent, she then finds out that she has lost: the past state of the world was such as to cause her to take Bet 1, so the proposition is false. If she had taken Bet 2, she would have found out that the proposition was correct, and she would have won, albeit a smaller amount than if she had won with Bet 1.

Betting on the Past seems to qualify as a kind of Newcomb’s paradox; it seems to have an equivalent payoff matrix (Figure 1).

Figure 1: Betting on the past has a similar payoff matrix to Newcomb’s paradox

P is true P is false
 Take Bet 1 10 -1
 Take Bet 2 1 -10

Furthermore, its causal structure seems to resemble those of e.g. the Smoking Lesion or Solomon’s problem, indicating it as a kind of medical Newcomb problem. In medical Newcomb problems, a “Nature” node determines both the present state of the world (whether the agent is sick/will win the bet) and the agent’s decision (see Figure 2). In this regard, they differ from Newcomb’s original problem, where said node refers to the agent’s decision algorithm.

Figure 2: Betting on the past (left) has a similar causal structure to medical Newcomb problems (right).

One could object to Betting on the Past being a medical Newcomb problem, since the outcomes conditional on our actions here are certain, while e.g. in the Smoking Lesion, observing our actions only shifts our probabilities in degrees. I believe this shouldn’t make a crucial difference. On the one hand, we can conceive of absolutely certain medical Newcomb cases like the Coin Flip Creation. On the other hand, Newcomb’s original problem is often formalized with absolute certainties as well. I’d be surprised if probabilistic vs. certain reasoning would make a difference to decision theories. First, we can always approximate certainties to an arbitrarily high degree. We might ask ourselves why a negligible further increase in certainty would at some point suddenly completely change the recommended action, then. Secondly, we’re never really certain in the real world anyway, so if the two cases would be different, this would render all thought experiments useless that use absolute certainties.

If Betting on the Past is indeed a kind of medical Newcomb problem, this would be an interesting conclusion. It would follow that if one prefers Bet 2, one should also one-box in medical Newcomb problems. And taking Bet 2 seems so obviously correct! I point this out because one-boxing in medical Newcomb problems is what EDT would do, and it is often put forward as both a counterexample to EDT and as the decision problem that separates EDT from Logical Decision Theories (LDT), such as TDT or UDT. (See e.g. Yudkowsky 2010, p.67)

Before we examine the case for EDT further, let’s take a closer look at what LDTs would do in Betting on the Past. As far as I understand, LDTs would take correlations with other decision algorithms into account, but they would ignore “retrocausality” (i.e. smoke in the smoker’s lesion, chew gum in the chewing gum problem, etc.). If there is a purely physical cause, then this causal node isn’t altered in the logical counterfactuals that an LDT agent reasons over. Perhaps if the bet was about the state of the world yesterday, LDT would still take Bet 2. Clearly, LDT’s algorithm already existed yesterday, and it can influence this algorithm’s output; so if it chooses Bet 2, it can change yesterday’s world and make the proposition true. But at some point, this reasoning has to break down. If we choose a more distant point in the past as a reference for Alice’s bet – maybe as far back as the birth of our universe – she’ll eventually be unable to exert any possible influence via logical counterfactuals. At some point, the correlation becomes a purely physical one. All she can do at that point is what opponents of evidential reasoning would call “managing the news” (Lewis, 1981) – she can merely try to go for the action that gives her the best Bayesian update.

So, do Logical Decision Theories get it wrong? I’m not sure about that; they come in different versions, and some haven’t yet been properly formalized, so it’s hard for me to judge. I can very well imagine that e.g. Proof-Based Decision Theory would take Bet 2, since it could prove P to be either true or false, contingent on the action it would take. I would argue, though, that if a decision theory takes Bet 2 – and if I’m right about Betting on the Past being a medical Newcomb problem – then it appears it would also have to “one-box”, i.e. take the option recommended by EDT, in other medical Newcomb problems.

If all of this is true, it might imply that we don’t really need LDT’s logical conditioning and that EDT’s simple Bayesian conditioning on actions could suffice. The only remaining difference between LDT and EDT would then be EDT’s lack of updatelessness. What would an updateless version of EDT look like? Some progress on this front has already been made by Everitt, Leike, and Hutter 2015. Caspar Oesterheld and I hope to be able to say more about it soon ourselves.

Acknowledgement

I wrote this post while working for the Foundational Research Institute, which is now the Center on Long-Term Risk.

Joyce’s Better Framing of Newcomb’s Problem

While I disagree with James M. Joyce on the correct solution to Newcomb’s problem, I agree with him that the standard framing of Newcomb’s problem (from Nozick 1969) can be improved upon. Indeed, I very much prefer the framing he gives in chapter 5.1 of The Foundations of Causal Decision Theory, which (according to Joyce) is originally due to JH Sobel:

Suppose there is a brilliant (and very rich) psychologist who knows you so well that he can predict your choices with a high degree of accuracy. One Monday as you are on the way to the bank he stops you, holds out a thousand dollar bill, and says: “You may take this if you like, but I must warn you that there is a catch. This past Friday I made a prediction about what your decision would be. I deposited $1,000,000 into your bank account on that day if I thought you would refuse my offer, but I deposited nothing if I thought you would accept. The money is already either in the bank or not, and nothing you now do can change the fact. Do you want the extra $1,000?” You have seen the psychologist carry out this experiment on two hundred people, one hundred of whom took the cash and one hundred of whom did not, and he correctly forecast all but one choice. There is no magic in this. He does not, for instance, have a crystal ball that allows him to “foresee” what you choose. All his predictions were made solely on the basis of knowledge of facts about the history of the world up to Friday. He may know that you have a gene that predetermines your choice, or he may base his conclusions on a detailed study of your childhood, your responses to Rorschach tests, or whatever. The main point is that you now have no causal influence over what he did on Friday; his prediction is a fixed part of the fabric of the past. Do you want the money?

I prefer this over the standard framing because people can remember the offer and the balance of their bank account better than box 1 and box 2. For some reason, I also find it easier to explain this thought experiments without referring to the thought experiment itself in the middle of the explanation. So, now whenever I describe Newcomb’s problem, I start with Sobel’s rather than Nozick’s version.

Of course, someone who wants to explore decision theory more deeply also needs to learn about the standard version, if only because people sometimes use “one-boxing” and “two-boxing” (the options in Newcomb’s original problem) to denote the analogous choices in other thought experiments. (Even if there are no boxes in these other thought experiments!) But luckily it does not take more than a few sentences to describe the original Newcomb problem based on Sobel’s version. You only need to explain that Newcomb’s problem replaces your bank account with an opaque box whose content you always keep; and puts the offer into a second, transparent box. And then the question is whether you stick with one box or go home with both.