A few weeks ago, I wrote about the BPB problem and how it poses a problem for classical/non-logical decision theories. In my post, I briefly mentioned a behaviorist approach to BPB, only to immediately discard it:
One might think that one could map between physical processes and algorithms on a pragmatic or functional basis. That is, one could say that a physical process A implements a program p to the extent that the results of A correlate with the output of p. I think this idea goes into the right direction and we will later see an implementation of this pragmatic approach that does away with naturalized induction. However, it feels inappropriate as a solution to BPB. The main problem is that two processes can correlate in their output without having similar subjective experiences. For instance, it is easy to show that Merge sort and Insertion sort have the same output for any given input, even though they have very different “subjective experiences”.
Since writing the post I became more optimistic about this approach because the counterarguments I mentioned aren’t particularly persuasive. The core of the idea is the following: Let A and B be parameterless algorithms1. We’ll say that A and B are equivalent if we believe that A outputs x iff B outputs x. In the context of BPB, your current decision is an algorithm A and we’ll say B is an instance or implementation of A/you iff A and B are equivalent. In the following sections, I will discuss this approach in more detail.
You still need interpretations
The definition only solves one part of the BPB problem: specifying equivalence between algorithms. This would solve BPB if all agents were bots (rather than parts of a bot or collections of bots) in Soares and Fallenstein’s Botworld 1.0. But in a world without any Cartesian boundaries, one still has to map parts of the environment to parameterless algorithms. This could, for instance, be a function from histories of the world onto the output set of the algorithm. For example, if one’s set of possible world models is a set of cellular automata (CA) with various different initial conditions and one’s notion of an algorithm is something operating on natural numbers, then such an interpretation i would be a function from CA histories to the set of natural numbers. Relative to i, a CA with initial conditions contains an instance of algorithm A if A outputs x <=> i(H)=x, where H is a random variable representing the history created by that CA. So, intuitively, i is reading A’s output off from a description the world. For example, it may look at the physical signals sent by a robot’s microprocessor to a motor and convert these into the output alphabet of A. E.g., it may convert a signal that causes a robot’s wheels to spin to something like “forward”. Every interpretation i is a separate instance of A.
Since we still need interpretations, we still have the problem of “joke interpretations” (Drescher 2006, sect. 2.3; also see this Brian Tomasik essay and references therein). In particular, you could have an interpretation i that does most of the work, so that the equivalence of A and i(H) is the result of i rather than the CA doing something resembling A.
I don’t think it’s necessarily a problem that an EDT agent might optimize its action too much for the possibility of being a joke instantiation, because it gives all its copies in a world equal weight no matter which copy it believes to be. As an example, imagine that there is a possible world in which joke interpretations lead to you to identify with a rock. If the rock’s “behavior” does have a significant influence on the world and the output of your algorithm correlates strongly with it, then I see no problem with taking the rock into account. At least, that is what EDT would do anyway if it has a regular copy in that world.2 If the rock has little impact on the world, EDT wouldn’t care much about the possibility of being the rock. In fact, if the world also contains a strongly correlated non-instance3 of you that faces a real decision problem, then the rock joke interpretation would merely lead you to optimize for the action of that non-copy.
If you allow all joke interpretations, then you would view yourself in all worlds. Thus, the view may have similar implications as the l-zombie view where the joke interpretations serve as the l-zombies.4 Unless we’re trying to metaphysically justify the l-zombie view, this is not what we’re looking for. So, we may want to remove “joke interpretations” in some way. One idea could be to limit the interpretation’s computational power (Aaronson 2011, sect. 6). My understanding is that this is what people in CA theory use to define the notion of implementing an algorithm in a CA, see, e.g., Cook (2004, sect. 2). Another idea would be to include only interpretations that you yourself (or A itself) “can easily predict or understand”. Assuming that A doesn’t know its own output already, this means that i cannot do most of the work necessary to entangle A with i(H). (For a similar point, cf. Bishop 2004, sect. “Objection 1: Hofstadter, ‘This is not science’”.) For example, if i would just compute A without looking at H, then A couldn’t predict i very well if it cannot predict itself. If, on the other hand, i reads off the result of A from a computer screen in H, then A would be able to predict i’s behavior for every instance of H. Brian Tomasik lists a few more criteria to judge interpretations by.
In my original rejection of the behaviorist approach, I made an argument about two sorting algorithms which always compute the same result but have different “subjective experiences”. I assumed that a similar problem could occur when comparing two equivalent decision-making procedures with different subjective experiences. But now I actually think that the behaviorist approach nicely aligns with what one might call introspective discernibility of experiences.
Let’s say I’m an agent that has, as a component, a sorting algorithm. Now, a world model may contain an agent that is just like me except that it uses a different sorting algorithm. Does that agent count as an instantiation of me? Well, that depends on whether I can introspectively discern which sorting algorithm I use. If I can, then I could let my output depend on the content of the sorting algorithm. And if I do that, then the equivalence between me and that other agent breaks. E.g., if I decide to output an explanation of my sorting algorithm, then my output would explain, say, bubble sort, whereas the other algorithm’s output would explain, say, merge sort. If, on the other hand, I don’t have introspective access to my sorting algorithm, then the code of the sorting algorithm cannot affect my output. Thus, the behaviorist view would interpret the other agent as an instantiation of me (as long as, of course, it, too, doesn’t have introspective access to its sorting algorithm). This conforms with the intuition that which kind of sorting algorithm I use is not part of my subjective experience. I find this natural relation to introspective discernibility very appealing.
That said, things are complicated by the equivalence relation being subjective. If you already know what A and B output, then they are equivalent if their output is the same — even if it is “coincidentally” so, i.e., if they perform completely unrelated computations. Of course, a decision algorithm will rarely know its own output in advance. So, this extreme case is probably rare. However, it is plausible that an algorithm’s knowledge about its own behavior excludes some conditional policies. For example, consider a case like Conitzer’s (2016, 2017), in which copies of an EU-maximizing agent face different but symmetric information. Depending on what the agent knows about its algorithm, it may view all the copies as equivalent or not. If it has relatively little self-knowledge, it could reason that if it lets its action depend on the information, the copies’ behavior would diverge. With more self-knowledge, on the other hand, it could reason that, because it is an EU maximizer and because the copies are in symmetric situations, its action will be the same no matter the information received.5
The BPB problem resembles the problem of consciousness: the question “does some physical system implement my algorithm?” is similar to the question “does some physical system have the conscious experience that I am having?”. For now, I don’t want to go too much into the relation between the two problems. But if we suppose that the two problems are connected, we can draw from the philosophy of mind to discuss our approach to BPB.
In particular, I expect that a common objection to the behaviorist approach will be that most instantiations in the behaviorist sense are behavioral p-zombies. That is, their output behavior is equivalent to the algorithm’s but they compute the output in a different way, and in particular in a way that doesn’t seem to give rise to conscious (or subjective) experiences. While the behaviorist view may lead us to identify with such a p-zombie, we can be certain, so the argument goes, that we are not given that we have conscious experiences.
Some particular examples include:
- Lookup table-based agents
- Messed up causal structures, e.g. Paul Durham’s experiments with his whole brain emulation in Greg Egan’s novel Permutation City.
I personally don’t find these arguments particularly convincing because I favor Dennett’s and Brian Tomasik’s eliminativist view on consciousness. That said, it’s not clear whether eliminativism would imply anything other than relativism/anti-realism for the BPB problem (if we view BPB and philosophy of mind as sufficiently strongly related).
This work was funded by the Foundational Research Institute (now the Center on Long-Term Risk).
1. I use the word “algorithm” in a very broad sense. I don’t mean to imply Turing computability. In fact, I think any explicit formal specification of the form “f()=…” should work for the purpose of the present definition. Perhaps, even implicit specifications of the output would work. ↩
2. Of course, I see how someone would find this counterintuitive. However, I suspect that this is primarily because the rock example triggers absurdity heuristics and because it is hard to imagine a situation in which you believe that your decision algorithm is strongly correlated with whether, say, some rock causes an avalanche. ↩
3. Although the behaviorist view defines the instance-of-me property via correlation, there can still be correlated physical subsystems that are not viewed as an instance of me. In particular, if you strongly limit the set of allowed interpretations (see the next paragraph), then the potential relationship between your own and the system’s action may be too complicated to be expressed as A outputs x <=> i(H)=x. ↩
5. If this is undesirable, one may try to use logical counterfactuals to find out whether B also “would have” done the same as A if A had behaved differently. However, I’m very skeptical of logical counterfactuals in general. Cf. the “Counterfactual Robustness” section in Tomasik’s post. ↩