[I assume the reader is familiar with Newcomb’s problem and causal decision theory. Some familiarity with basic theoretical computer science ideas also helps.]

In a recently (Open-Access-)published paper, I (together with my PhD advisor, Vince Conitzer) proposed the following Newcomb-like scenario as an argument against causal decision theory:

> Adversarial Offer: Two boxes, B1 and B2, are on offer. A (risk-neutral) buyer may purchase one or none of the boxes but not both. Each of the two boxes costs \$1. Yesterday, the seller put \$3 in each box that she predicted the buyer would not acquire. Both the seller and the buyer believe the seller’s prediction to be accurate with probability 0.75.

If the buyer buys one of the boxes, then the seller makes a profit in expectation of \$1 – 0.25 * \$3 = \$0.25. Nonetheless, causal decision theory recommends buying a box. This is because at least one of the two boxes must contain \$3, so that the average box contains at least \$1.50. It follows that the causal decision theorist must assign an expected causal utility of at least \$1.50 to (at least) one of the boxes. Since \$1.50 exceeds the cost of \$1, causal decision theory recommends buying one of the boxes. This seems undesirable. So we should reject causal decision theory.

The randomization response

One of the obvious responses to the Adversarial Offer is that the agent might randomize. In the paper, we discuss this topic at length in Section IV.1 and in the subsection on ratificationism in Section IV.4. If you haven’t thought much about randomization in Newcomb-like problems before, it probably makes sense to first check out the paper and only then continue reading here, since the paper makes more straightforward points.

The information-theoretic variant

I now give a new variant of the Adversarial Offer, which deals with the randomization objection in a novel and very interesting way. Specifically, the unique feature of this variant is that CDT correctly assesses randomizing to be a bad idea. Unfortunately, it is quite a bit more complicated than the Adversarial Offer from the paper.

Imagine that the buyer is some computer program that has access to a true random number generator (TRNG). Imagine also that the buyer’s source code plus all its data (memories) has a size of, say, 1GB and that the seller knows that it has (at most) this size. If the buyer wants to buy a box, then she will have to pay \$1 as usual, but instead of submitting a single choice between buying box 1 and buying box 2, she has to submit 1TB worth of choices. That is, she has to submit a sequence of 2^43 (=8796093022208) bits, each encoding a choice between the boxes.

If there is at least one 1GB program that outputs w deterministically, then the seller forgets w again. She then picks an index i of w at random. She predicts w[i] based on what she knows about the buyer and based on w[1…i-1], i.e., on all the bits of w preceding i. Call the prediction w’[i]. She fills the box based on her prediction w’[i] and the buyer receives (in return for the \$1) the box specified by w[i].

Why the information-theoretic variant is interesting

The scenario is interesting because of the following three facts (which I will later argue to hold):

1. The seller makes a profit off agents who try to buy boxes, regardless of whether they do so using randomization or not.
2. CDT and related theories (such as ratificationism) assess randomizing to be worse than not buying any box.
3. CDT will recommend buying a box (mostly deterministically).

I’m not aware of any other scenarios with these properties. Specifically, the novelty is item 2. (Our paper offers a scenario that has the other two properties.) The complications of this scenario – letting the agent submit a TB worth of choices to then determine whether they are random – are all introduced to achieve item 2 (while preserving the other items).

In the following, I want to argue in more detail for these three points and for the claim that the scenario can be set up at all (which I will do under claim 1).

1. For this part, we need to show two things:

A) Intuitively, if the agent submits a string of bits that uses substantially more than 1GB worth of randomness, then he is extremely likely to receive no box at all.

B) Intuitively, if the agent uses only about 1GB or less worth randomness, then the seller – using the buyer’s source code will likely be able to predict w[i] with high accuracy based on w[1…i-1].

I don’t want to argue too rigorously for either of these, but below I’ll give intuitions and some sketches of the information-theoretic arguments that one would need to give to make them more rigorous.

A) The very simple point here is that if you create, say, a 2GB bitstring w where each bit is determined by a fair coin flip, then it is very unlikely that there exists a program that deterministically outputs w. After all, there are many more ways to fill 2GB of bits than there are 1GB programs (about 2^(2^33) as many). From this one may be tempted to conclude that if the agent determines, say, 2GB of the TB of choices by flipping coin, he is likely to receive no box. But this argument is incomplete, because there are other ways to use coin flips. For example, the buyer might use the following policy: Flip 2GB worth of coins. If they all come up heads, always take box B. Otherwise follow some given deterministic procedure.

To make the argument rigorous, I think we need to state the claim information-theoretically. But even this is a bit tricky. For example, it is not a problem per se for w to have high entropy if most of the entropy comes from a small part of the distribution. (For example, if with probability 0.01 the seller randomizes all choices, and with the remaining probability always chooses Box 1, then the entropy of w is 0.01*1TB = 10GB (??), but the buyer is still likely to receive a box.) So I think we’d need to make a more complicated claim, of the sort: if there is no substantial part (say, >p) of the distribution over w that has less than, say, 2GB of entropy, then with high probability (>1-p), the agent will receive no box.

B) Again, we can make a simple but incomplete argument: If of the 1TB of choices, only, say, 2GB are determined by random coin flips, then a randomly sampled bit is likely to be predictable from the agent’s source code. But again, the problem is that the random coin flips can be used in other ways. For example, the buyer might use a deterministic procedure to determine w (say, w=01010101…), but then randomly generate a number n (with any number n chosen with probability 2^-n, for instance), then randomly sample n indices j of w and flip w[j] for each of them. This may have relatively low entropy. But now the seller cannot perfectly predict w[i] given w[1…i-1] for any i.

Again, I think a rigorous argument requires information theory. In particular, we can use the fact that H(w) = H(w[0])+H(w[1]|w[0])+H(w[2]|w[0,1])+…, where H denotes entropy. If H(w) is less than, say, 2GB, then the average of H(w[i]|w[1…i-1]) must be at most 2GB/1TB = 1/500. From this, it follows immediately that with high probability, w[i] can be predicted with high accuracy given w[1…i-1].

2. This is essential, but straightforward: Generating w at random causes the seller to determine that w is determined at random. Therefore, CDT (accurately) assesses randomly generating w to have an expected utility near \$-1.

3. Finally, I want to argue that CDT will recommend buying a box. For this, we only need to argue that CDT prefers some method of submitting w over not submitting a box. So consider the following procedure: First, assign beliefs over the seller’s prediction w’[0] of the first bit. Since there are only two possible boxes, for at least one of the boxes j, it is the case that P(w’[0]=j)<=½, where P refers to the probability assigned by the buyer. Let w[0] = j. We now repeat this inductively. That is, for each i given the w[1…i-1] that we have already constructed, the buyer sets w[i]=k s.t. P(w’[i]=k||w[1…i-1])<=½.

What’s the causal expected utility of submitting w thus constructed? Well, for one, because the procedure is deterministic (if ties are broken deterministically), the buyer can expect that she will receive a box at all. Now, for all i, the buyer thinks that if i is sampled by the seller for the purpose of determining which box to give to the buyer, then the buyer will in causal expectation receive \$1.50, because the seller will predict the wrong box, i.e. w’[i] ≠ w[i], with probability at least ½.

# Cheating at thought experiments

Thought experiments are important throughout the sciences. For example, it appears to be that a thought experiment played an important role in Einstein’s discovery of special relativity. In philosophy and ethics and the theory of the mind in particular, thought experiments are especially important and there are many famous ones. Unfortunately, many thought experiments might just be ways of tricking people. Like their empirical counterparts, they are prone to cheating if they lack rigor and the reader does not try to reproduce (or falsify) the results.

In his book, Consciousness Explained, Daniel Dennett gives (at least) three examples of cheating in thought experiments. The first one is from chapter 9.5 and Dennett’s argument roughly runs as follows. After having described his top-level theory of the human mind, he addresses the question “Couldn’t something unconscious – a zombie, for instance – have [all this machinery]?” This argument against functionalism, computationalism and the like, is often accompanied by the following argument: “That’s all very well, all those functional details about how the brain does this and that, but I can imagine all that happening in an entity without the occurrence of any real consciousness.” To this Dennett replies: “Oh, can you? How do you know? How do you know you’ve imagined ‘all that’ in sufficient detail, and with sufficient attention to all the implications.”

With regard to another thought experiment, Mary, the color scientist, Dennett elaborates (ch. 12.5): “[Most people] are simply not following directions! The reason no one follows directions is because what they ask you to imagine is so preposterously immense, you can’t even try. The crucial premise […] is not readily imaginable, so no one bothers.”

In my opinion, this summarizes the problems with many thought experiments (specifically intuition pumps): Readers do not (most often because they cannot) follow instructions and are thus unable to mentally set up the premises of the thought experiment. And then they try to reach a conclusion anyway based on their crude approximation to the situation.

Another example is Searle’s Chinese Room, which Dennett covers in chapter 14.1 of his book. When Searle asks people to imagine a person that does not speak Chinese, but answers queries in Chinese using a large set of rules and a library, they probably think of someone looking up definitions in a lexicon or something. At least, this is feasible and also resembles the way people routinely pretend to have knowledge that they don’t. What people don’t imagine is the thousands of steps that it would take the person to compose even short replies (and choosing Chinese as a language does not help most English speaking readers to imagine the complexity of the procedure of composing a message). If they did simulate the entire behavior of the whole system (the Chinese room with the person in it), they might conclude that it has an understanding of Chinese after all. And thus, this thought experiment is not suitable for debunking the idea that consciousness can arise from following rules.

Going beyond what Dennett discusses in his book, I’d like to consider further thought experiments that fit the pattern. For example, people often argue that hedonistic utilitarianism demands that the universe be tiled with some (possibly very simple) object that is super-happy. Or at least that individual humans should be replaced this way. In an interview, Eliezer Yudkowsky said:

[A utilitarian superintelligence] goes out to the stars, takes apart the stars for raw materials, and it builds whole civilizations full of minds experiencing the most exciting thing ever, over and over and over and over and over again.

The whole universe is just tiled with that, and that single moment is something that we would find this very worthwhile and exciting to happen once. But it lost the single aspect of value that we would name boredom […].

And so you lose a single dimension, and the [worthwhileness of the universe] – from our perspective – drops off very rapidly.

This thought experiment is meant to prove that having pure pleasure alone is not a desirable result. Instead, many people endorse complexity of value – which is definitely true from a descriptive point of view – and describe in detail many good things that utopia should contain. While I have my own doubts about the pleasure-filled universe, my suspicion is that one reason why people don’t like it is that they don’t consider it for very long and  don’t actually imagine all the happiness. “Sure, some happiness is nice, but happiness gets less interesting when having large amounts of it.” The more complex scenario on the other hand can actually be imagined more easily and due to having different kinds of good stuff, one does not have to base judgment entirely on some number being very large. Closing the discussion of this example, I would like to remark that I am, at the time of writing this, not a convinced hedonistic utilitarian. (Rather I am inclined towards a more preferentist view, which, I feel, is in line with endorsing complexity of value and value extrapolation, though I am skeptical of preference idealization as proposed in Yudkowsky’s Coherent Extrapolated Volition. Furthermore, I care more about suffering than about happiness, but that’s a different story…) I just think that the universe filled with eternal bliss cannot be used as a very convincing argument against hedonistic utilitarianism. Similar arguments may apply to deciding whether currently, the bad things on earth are outweighed by the good things.

The way out of this problem of unimaginable thought experiments is to confine ourselves to thought experiments that are within our cognitive reach. Results may then, if possible, be extrapolated to the more complex situations. For example, I find it more fruitful to talk about whether I only care about pleasure in other individuals, or also about whether they are doing something that is very boring from the outside.