Anthropic uncertainty in the Evidential Blackmail

I’m currently writing a piece on anthropic uncertainty in Newcomb problems. The idea is that whenever someone simulates us to predict our actions, this leads us to have anthropic uncertainty about whether we’re in this simulation or not. (If we knew whether we were in the real world or in the simulation, then the simulation wouldn’t fulfill its purpose anymore.) This kind of reasoning changes quite a lot about the answers that decision theories give in predictive dilemmas. It makes their reasoning “more updateless”, since they reason from a more impartial stance: a stance from which they don’t know their exact position in the thought experiment, yet.

This topic isn’t new, but it hasn’t been discussed in-depth before. As far as I am aware, it has been brought up on LessWrong by gRR and in two blog posts by Stuart Armstrong. Outside LessWrong, there is a post by Scott Aaronson, and one by Andrew Critch. The idea is also mentioned in passing by Neal (2006, p. 13). Are there any other sources and discussions of it that I have overlooked?

In this post, I examine what the assumption that predictions or simulations lead to anthropic uncertainty implies for the Evidential Blackmail (also XOR Blackmail), a problem which is often presented as a counter-example to evidential decision theory (EDT) (Cf. Soares & Fallenstein, 2015, p. 5; Soares & Levinstein, 2017, pp. 3–4). A similar problem has been introduced as “Yankees vs. Red Sox” by Arntzenius (2008), and discussed by Ahmed and Price (2012). I would be very grateful for any kind of feedback on my post.

We could formalize the blackmailer’s procedure in the Evidential Blackmail something like this:

def blackmailer(): your_action = your_policy(receive_letter) if predict_stock() == “retain” and your_action == “pay”: return “letter” elif predict_stock() == “fall” and your_action == “not pay”: return “letter” else return “no letter”

Let p denote the probability P(retain) with which our stock retains its value a. The blackmailer asks us for an amount of money b, where 0<b<a. The ex ante expected utilities are now:

EU(pay) = P(letter|pay) * (a – b) + P(no letter & retain|pay) * a = p (a – b),

EU(not pay) = P(no letter & retain|not pay) * a = p a.

According to the problem description, P(no letter & retain|pay) is 0, and P(no letter & retain|not pay) is p.¹ As long as we don’t know whether a letter has been sent or not (even if it might already be on its way to us), committing to not paying gives us only information about whether the letter has been sent, not about our stock, so we should commit not to pay.

Now for the situation in which we have already received the letter. (All of the following probabilities will be conditioned on “letter”.) We don’t know whether we’re in the simulation or not. But what we do if we’re in the simulation can actually change our probability that we’re in the simulation in the first place. Note that the blackmailer has to simulate us one time in any case, regardless of whether our stock goes down or not. So if we are in the simulation and we receive the letter, P(retain|pay) is still equal to P(retain|not pay): neither paying nor not paying gives us any evidence about whether our stock retains its value or not, conditional on being in the simulation. But if we are in the simulation, we can influence whether the blackmailer sends us a letter in the real world. In the simulation, our action decides over whether we receive the letter in the cases where we keep our money, or whether we receive the letter when we lose.

Let’s begin by calculating EDT’s expected utility of not paying. We will lose all money for certain if we’re in the real world and don’t pay, so we only consider the case where we’re in the simulation:

EU(not pay) = P(sim & retain|not pay) * a.

For both SSA and SIA, if our stock doesn’t go down and we don’t pay up, then we’re certain to be in the simulation: P(sim|retain, not pay) = 1, while we could be either simulated or real if our stock falls: P(sim|fall, not pay) = 1/2. Moreover, P(sim & retain|not pay) = P(retain|sim, not pay) * P(sim) = P(sim|retain, not pay) * P(retain). Under SSA, P(retain) is just p.² We hence get

EU_SSA(not pay) = P(sim|retain, not pay) * p * a = p a.

Our expected utility for paying is:

EU_SSA(pay) = P(sim & retain|pay) * (a – b) + P(not sim|pay) * (a – b)

= P(sim|retain, pay) * p * (a – b) + P(not sim|pay) * (a – b).

If we pay up and the stock retains its value, there is exactly one of us in the simulation and one of us in the real world, so P(sim|retain, pay) = 1/2, while we’re sure to be in the simulation for the scenario in which our stock falls: P(sim|fall, pay) = 1. Knowing both P(sim & retain|pay) and P(sim & fall|pay), we can calculate P(not sim|pay) = p/2. This gives us

EU_SSA(pay) = 1/2 * p * (a – b) + 1/2 * p * (a – b) = p (a – b).

Great, EDT + SSA seems to calculate exactly the same payoffs as all other decision theories – namely, that by paying the Blackmailer, one just loses the money one pays the blackmailer, but gains nothing.

For SIA probabilities, P(retain|letter) depends on whether we pay or don’t pay. If we pay, then there are (in expectation) 2 p observers in the “retain” world, while there are (1 – p) observers in the “fall” world. So our updated P(retain|letter, pay) should be (2 p)/(1 + p). If we don’t pay, it’s p/(2 – p) respectively. Using the above probabilities and Bayes’ theorem, we have P(sim|pay) = 1/(1 + p) and P(sim|not pay) = 1/(2 – p). Hence,

EU_SIA(not pay) = P(sim & retain|not pay) * a = (p a)/(2 – p),

and

EU_SIA(pay) = P(sim) * P(retain|sim) * (a – b) + P(not sim) * (a – b)

= (p (a – b))/(1 + p) + (p (a – b))/(1 + p)

= (2 p (a – b))/(1 + p).

It seems like paying the blackmailer would be better here than not paying, if p and b are sufficiently low.

Why doesn’t SIA give the ex ante expected utilities, as SSA does? Up until now I have just assumed correlated decision-making, so that the decisions of the simulated us will also be those of the real-world us (and of course the other way around – that’s how the blackmail works in the first place). The simulated us hence also gets attributed the impact of our real copy. The problem is now that SIA thinks we’re more likely to be in worlds with more observers. So the worlds in which we have additional impact due to correlated decision-making get double-counted. In the world where we pay the blackmailer, there are two observers for p, while there is only one observer for (1 – p). If we don’t pay the blackmailer, there is only one observer for p, and two observers for (1 – p). SIA hence slightly favors paying the blackmailer, to make the p-world more likely.

To remediate the problem of double-counting for EDT + SIA, we could use something along the lines of Stuart Armstrong’s Correlated Decision Principle (CDP). First, we aggregate the “EDT + SIA” expected utilities of all observers. Then, we divide this expected utility by the number of individuals who we are deciding for. For EU_CDP(pay), there is with probability 1 an observer in the simulation, and with probability p one in the real world. To get the aggregated expected utility, we thus have to multiply EU(pay) by (1 + p). Since we have decided for two individuals, we divide this EU by 2 and get EU_CDP(pay) = ((2 p (a – b))/(1 + p)) * 1/2 * (1 + p) = p (a – b).

For EU_CDP(not pay), it gets more complex: the number of individuals any observer is making a decision for is actually just 1 – namely, the observer in the simulation. The observer in the real world doesn’t get his expected utility from his own decision, but from influencing the other observer in the simulation. On the other hand, we multiply EU(not pay) by (2 – p), since there is one observer in the simulation with probability 1, and with probability (1 – p) there is another observer in the real world. Putting this together, we get EU_CDP(not pay) = ((p a)/(2 – p)) * (2 – p) = p a. So EDT + SIA + CDP arrives at the same payoffs as EDT + SSA, although it is admittedly a rather messy and informal approach.

I conclude that, when taking into account anthropic uncertainty, EDT doesn’t give in to the Evidential Blackmail. This is true for SSA and possibly also for SIA + CDP. Fortunately, at least for SSA, we have avoided any kind of anthropic funny-business. Note that this is not some kind of dirty hack: if we grant the premise that simulations have to involve anthropic uncertainty, then per definition of the thought experiment – because there is necessarily a simulation involved in the Evidential Blackmail –, EDT doesn’t actually pay the blackmailer. Of course, this still leaves open the question of whether we have anthropic uncertainty in all problems involving simulations, and hence whether my argument applies to all conceivable versions of the problem. Moreover, there are other anthropic problems, such as the one introduced by Conitzer (2015a), in which EDT + SSA are still exploitable (in absence of a method to “bind themselves”).

Acknowledgement

I wrote this post while working for the Foundational Research Institute, which is now the Center on Long-Term Risk.

² This becomes apparent if we compare the Evidential Blackmail to Sleeping Beauty. SSA is the “halfer position”, which means that after updating on being an observer (receiving the letter), we should still assign the prior probability p, regardless of how many observers there are in either of the two possible worlds.

³ The result that EDT and SIA lead to actions that are not optimal ex ante is also featured in several publications about anthropic problems, e.g., Arntzenius, 2002; Briggs, 2010; Conitzer, 2015b; Schwarz, 2015.

Ahmed, A., & Price, H. (2012). Arntzenius on “Why ain”cha rich?’. Erkenntnis. An International Journal of Analytic Philosophy, 77(1), 15–30.

Arntzenius, F. (2002). Reflections on Sleeping Beauty. Analysis, 62(1), 53–62.
Arntzenius, F. (2008). No Regrets, or: Edith Piaf Revamps Decision Theory. Erkenntnis. An International Journal of Analytic Philosophy, 68(2), 277–297.

Briggs, R. (2010). Putting a value on Beauty. Oxford Studies in Epistemology, 3, 3–34.

Conitzer, V. (2015a). A devastating example for the Halfer Rule. Philosophical Studies, 172(8), 1985–1992.

Conitzer, V. (2015b). Can rational choice guide us to correct de se beliefs? Synthese, 192(12), 4107–4119.

Neal, R. M. (2006, August 23). Puzzles of Anthropic Reasoning Resolved Using Full Non-indexical Conditioning. arXiv [math.ST]. Retrieved from http://arxiv.org/abs/math/0608592

Schwarz, W. (2015). Lost memories and useless coins: revisiting the absentminded driver. Synthese, 192(9), 3011–3036.

Soares, N., & Fallenstein, B. (2015, July 7). Toward Idealized Decision Theory. arXiv [cs.AI]. Retrieved from http://arxiv.org/abs/1507.01986

Soares, N., & Levinstein, B. (2017). Cheating Death in Damascus. Retrieved from https://intelligence.org/files/DeathInDamascus.pdf

3 thoughts on “Anthropic uncertainty in the Evidential Blackmail”

abram demski

Another important post on this sort of thing:

https://agentfoundations.org/item?id=853

LikeLiked by 1 person

May 21, 2017 at 4:45 Reply
abramdemski

Another important post on this sort of thing is “In memoryless Cartesian environments, every UDT policy is a CDT+SIA policy” by Jessica Taylor, on the Intelligent Agent Foundations forum. (Not including a link because I’m afraid of being sucked up by spam vacuums.)

LikeLiked by 1 person

May 21, 2017 at 4:47 Reply
Pingback: UDT is “updateless” about its utility function – The Universe from an Intentional Stance

	Lukas Finnveden on “Betting on the Past” by Arif…
	Jesse Clifton on Decision Theory and the Irrele…
	Lukas Finnveden on Cooperative AI competitions wi…
	Caspar on Cooperative AI competitions wi…
	Lukas Finnveden on Cooperative AI competitions wi…

Acknowledgement

Teilen mit:

3 thoughts on “Anthropic uncertainty in the Evidential Blackmail”

Leave a reply to abramdemski Cancel reply