A behaviorist approach to building phenomenological bridges

A few weeks ago, I wrote about the BPB problem and how it poses a problem for classical/non-logical decision theories. In my post, I briefly mentioned a behaviorist approach to BPB, only to immediately discard it:

One might think that one could map between physical processes and algorithms on a pragmatic or functional basis. That is, one could say that a physical process A implements a program p to the extent that the results of A correlate with the output of p. I think this idea goes into the right direction and we will later see an implementation of this pragmatic approach that does away with naturalized induction. However, it feels inappropriate as a solution to BPB. The main problem is that two processes can correlate in their output without having similar subjective experiences. For instance, it is easy to show that Merge sort and Insertion sort have the same output for any given input, even though they have very different “subjective experiences”.

Since writing the post I became more optimistic about this approach because the counterarguments I mentioned aren’t particularly persuasive. The core of the idea is the following: Let A and B be parameterless algorithms1. We’ll say that A and B are equivalent if we believe that A outputs x iff B outputs x. In the context of BPB, your current decision is an algorithm A and we’ll say B is an instance or implementation of A/you iff A and B are equivalent. In the following sections, I will discuss this approach in more detail.

You still need interpretations

The definition only solves one part of the BPB problem: specifying equivalence between algorithms. This would solve BPB if all agents were bots (rather than parts of a bot or collections of bots) in Soares and Fallenstein’s Botworld 1.0. But in a world without any Cartesian boundaries, one still has to map parts of the environment to parameterless algorithms. This could, for instance, be a function from histories of the world onto the output set of the algorithm. For example, if one’s set of possible world models is a set of cellular automata (CA) with various different initial conditions and one’s notion of an algorithm is something operating on natural numbers, then such an interpretation i would be a function from CA histories to the set of natural numbers. Relative to i, a CA with initial conditions contains an instance of algorithm A if A outputs x <=> i(H)=x, where H is a random variable representing the history created by that CA. So, intuitively, i is reading A’s output off from a description the world. For example, it may look at the physical signals sent by a robot’s microprocessor to a motor and convert these into the output alphabet of A. E.g., it may convert a signal that causes a robot’s wheels to spin to something like “forward”. Every interpretation i is a separate instance of A.

Joke interpretations

Since we still need interpretations, we still have the problem of “joke interpretations” (Drescher 2006, sect. 2.3; also see this Brian Tomasik essay and references therein). In particular, you could have an interpretation i that does most of the work, so that the equivalence of A and i(H) is the result of i rather than the CA doing something resembling A.

I don’t think it’s necessarily a problem that an EDT agent might optimize its action too much for the possibility of being a joke instantiation, because it gives all its copies in a world equal weight no matter which copy it believes to be. As an example, imagine that there is a possible world in which joke interpretations lead to you to identify with a rock. If the rock’s “behavior” does have a significant influence on the world and the output of your algorithm correlates strongly with it, then I see no problem with taking the rock into account. At least, that is what EDT would do anyway if it has a regular copy in that world.2 If the rock has little impact on the world, EDT wouldn’t care much about the possibility of being the rock. In fact, if the world also contains a strongly correlated non-instance3 of you that faces a real decision problem, then the rock joke interpretation would merely lead you to optimize for the action of that non-copy.

If you allow all joke interpretations, then you would view yourself in all worlds. Thus, the view may have similar implications as the l-zombie view where the joke interpretations serve as the l-zombies.4 Unless we’re trying to metaphysically justify the l-zombie view, this is not what we’re looking for. So, we may want to remove “joke interpretations” in some way. One idea could be to limit the interpretation’s computational power (Aaronson 2011, sect. 6). My understanding is that this is what people in CA theory use to define the notion of implementing an algorithm in a CA, see, e.g., Cook (2004, sect. 2). Another idea would be to include only interpretations that you yourself (or A itself) “can easily predict or understand”. Assuming that A doesn’t know its own output already, this means that i cannot do most of the work necessary to entangle A with i(H). (For a similar point, cf. Bishop 2004, sect. “Objection 1: Hofstadter, ‘This is not science’”.) For example, if i would just compute A without looking at H, then A couldn’t predict i very well if it cannot predict itself. If, on the other hand, i reads off the result of A from a computer screen in H, then A would be able to predict i’s behavior for every instance of H. Brian Tomasik lists a few more criteria to judge interpretations by.

Introspective discernibility

In my original rejection of the behaviorist approach, I made an argument about two sorting algorithms which always compute the same result but have different “subjective experiences”. I assumed that a similar problem could occur when comparing two equivalent decision-making procedures with different subjective experiences. But now I actually think that the behaviorist approach nicely aligns with what one might call introspective discernibility of experiences.

Let’s say I’m an agent that has, as a component, a sorting algorithm. Now, a world model may contain an agent that is just like me except that it uses a different sorting algorithm. Does that agent count as an instantiation of me? Well, that depends on whether I can introspectively discern which sorting algorithm I use. If I can, then I could let my output depend on the content of the sorting algorithm. And if I do that, then the equivalence between me and that other agent breaks. E.g., if I decide to output an explanation of my sorting algorithm, then my output would explain, say, bubble sort, whereas the other algorithm’s output would explain, say, merge sort. If, on the other hand, I don’t have introspective access to my sorting algorithm, then the code of the sorting algorithm cannot affect my output. Thus, the behaviorist view would interpret the other agent as an instantiation of me (as long as, of course, it, too, doesn’t have introspective access to its sorting algorithm). This conforms with the intuition that which kind of sorting algorithm I use is not part of my subjective experience. I find this natural relation to introspective discernibility very appealing.

That said, things are complicated by the equivalence relation being subjective. If you already know what A and B output, then they are equivalent if their output is the same — even if it is “coincidentally” so, i.e., if they perform completely unrelated computations. Of course, a decision algorithm will rarely know its own output in advance. So, this extreme case is probably rare. However, it is plausible that an algorithm’s knowledge about its own behavior excludes some conditional policies. For example, consider a case like Conitzer’s (2016, 2017), in which copies of an EU-maximizing agent face different but symmetric information. Depending on what the agent knows about its algorithm, it may view all the copies as equivalent or not. If it has relatively little self-knowledge, it could reason that if it lets its action depend on the information, the copies’ behavior would diverge. With more self-knowledge, on the other hand, it could reason that, because it is an EU maximizer and because the copies are in symmetric situations, its action will be the same no matter the information received.5


The BPB problem resembles the problem of consciousness: the question “does some physical system implement my algorithm?” is similar to the question “does some physical system have the conscious experience that I am having?”. For now, I don’t want to go too much into the relation between the two problems. But if we suppose that the two problems are connected, we can draw from the philosophy of mind to discuss our approach to BPB.

In particular, I expect that a common objection to the behaviorist approach will be that most instantiations in the behaviorist sense are behavioral p-zombies. That is, their output behavior is equivalent to the algorithm’s but they compute the output in a different way, and in particular in a way that doesn’t seem to give rise to conscious (or subjective) experiences. While the behaviorist view may lead us to identify with such a p-zombie, we can be certain, so the argument goes, that we are not given that we have conscious experiences.

Some particular examples include:

  • Lookup table-based agents
  • Messed up causal structures, e.g. Paul Durham’s experiments with his whole brain emulation in Greg Egan’s novel Permutation City.

I personally don’t find these arguments particularly convincing because I favor Dennett’s and Brian Tomasik’s eliminativist view on consciousness. That said, it’s not clear whether eliminativism would imply anything other than relativism/anti-realism for the BPB problem (if we view BPB and philosophy of mind as sufficiently strongly related).


This work was funded by the Foundational Research Institute (now the Center on Long-Term Risk).

1. I use the word “algorithm” in a very broad sense. I don’t mean to imply Turing computability. In fact, I think any explicit formal specification of the form “f()=…” should work for the purpose of the present definition. Perhaps, even implicit specifications of the output would work. 

2. Of course, I see how someone would find this counterintuitive. However, I suspect that this is primarily because the rock example triggers absurdity heuristics and because it is hard to imagine a situation in which you believe that your decision algorithm is strongly correlated with whether, say, some rock causes an avalanche. 

3. Although the behaviorist view defines the instance-of-me property via correlation, there can still be correlated physical subsystems that are not viewed as an instance of me. In particular, if you strongly limit the set of allowed interpretations (see the next paragraph), then the potential relationship between your own and the system’s action may be too complicated to be expressed as A outputs x <=> i(H)=x

4. I suspect that the two might differ in medical or “common cause” Newcomb-like problems like the coin flip creation problem

5. If this is undesirable, one may try to use logical counterfactuals to find out whether B also “would have” done the same as A if A had behaved differently. However, I’m very skeptical of logical counterfactuals in general. Cf. the “Counterfactual Robustness” section in Tomasik’s post. 

11 thoughts on “A behaviorist approach to building phenomenological bridges

  1. A generalization of the behaviorist approach would be the following: Instead of considering the logical dependences between your actions and things in the world, you consider dependences between other variables in your source code that you do not know the value of, yet, and things in the world. (I came up with this idea in a conversation with Jonny.)


  2. In the section “Betting on the Past” of his 2016 review essay to Arif Ahmed’s Evidence, Decision and Causality ( https://www.pdcnet.org/pdc/bvdb.nsf/purchase?openform&fp=jphil&id=jphil_2016_0113_0004_0224_0232 ), James Joyce makes a point that could be construed as being based on the behaviorist approach to the BPB problem.

    Essentially, he argues that in “Betting on the Past” (discussed by Jonny in https://casparoesterheld.com/2017/02/06/betting-on-the-past-by-arif-ahmed/ ), one should ignore those versions of oneself whose actions one can already predict. He writes:
    “If Alice knows L then she also knows the news of her choice was ‘made’ long ago: she just will not learn what it is until she chooses. Given this, even proponents of EDT should admit that there is no fact of the matter about what Alice ought to do in D_L , and that R_2 is the best choice because it is superior to R_1 in every state where Alice can ‘make the news’ about what she chooses.”
    I assume the argument is more directly about free will. (Note that it doesn’t seem to apply in standard EDT.) But this free will argument can be directly cast as an argument from the behaviorist approach to BPB: ignore worlds in which you can already predict everything (including your own action), because a constant cannot be dependent on your action.


  3. Instead of dividing interpretations into allowed and forbidden ones, we can give each interpretation some weight (similar to a prior probability, although the weights don’t have to sum up to 1).


  4. If we limit interpretations to easy to compute or short ones, we could view the behaviorist approach to BPB as saying: My algorithm is implemented in the physical structure X to the extent that my algorithm is practically useful in predicting X. This is very similar to Daniel Dennett’s intentional stance ( https://en.wikipedia.org/wiki/Intentional_stance ).

    In essence Dennett argues that we ascribe beliefs, desires, etc. to systems insofar as this is useful for predicting the behavior of that system. In ch. 3 of his book “The Intentional Stance”, he even uses the example of algorithms implemented in the cellular automaton “(Game of) Life” ( https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life ). He writes:

    >One of the triumps […] in the Life world is a universal Turing machine – a configuration whose behavior can be interpreted as the state-switching, symbol-reading, and symbol-writing of a simple computer, which can be “programmed” to compute any computable function. […]
    >Anyone who hypothesizes that some configuration in the Life world is such a Turing machine can predict its future state with precision, efficiency, and a modicum of risk. Adopt the “Turing machine stance” and, ignoring both the physics of the Life world and the design details of the machine; then translate the function’s output back into the symbol system of the Life world machine. That configuration of ONs and OFFs will soon appear, you can predict, provided that no stray gliders or other noisy debris collide with the Turing machine and destroy it or cause it to malfunction.


  5. A possible criticism of the behaviorist approach to BPB is that it only gives an answer to agents and not to purely passive observers. I can think of two possible responses. First, it is unclear whether this is a problem. Even without BPB, one could argue that anthropic probabilities only make sense in the context of agency. Only agency allows one to ask whether some method of assigning anthropic probabilities can be dutch-booked and only agency allows one to extract probabilities using bets. Without agents, one is merely left with a few conflicting intuitions about how anthropic probabilities should be assigned. Second, the behaviorist approach can be applied to a variety of actions, including “mental actions” such as loading a particular variable with a particular value. Thus, even an entity that doesn’t “act” in the conventional sense of the word can view itself as an agent choosing a probability according to a particular algorithm and then look for worlds with sets of particles that behave like this algorithm.


  6. I suppose one could also have an additional mapping between states of the physical system and inputs/observations. This would make it possible to ask the question whether a system would have counterfactually (upon being given another input) also behaved as the abstract algorithm.


    1. On p. 55f., “Good and Real”, Gary L. Drescher argues that such counterfactual behavior should be relevant for whether a system should be regarded as an implementation of some algorithm.


  7. More sophisticated approaches may take runtime as an aspect of behavior into account. For example, if you are an algorithm that either takes a long time and then outputs 1 or takes a short time and then outputs 0, then you may only identify with interpretation functions that either let the CA run for a while and then output 1 or let the CA run for a short time and then output 0.

    This approach enables one to, e.g., differentiate between sorting algorithms of different complexity. I guess one could criticize it on the basis that looking at runtime is seems to be used as a proxy for the structure of the program. However, I would maintain that complexity is an aspect of behavior.


  8. Regarding the Giant-lookup-table-objection to the behaviorist approach:

    The giant lookup table agents that behave like humans (GLUT-humans) were conceived as an argument against behaviorism (and similar views) in the philosophy of mind. Some have proposed the following counterargument: How would one generate the entries in the GLUT-human? Presumably, one would have to expose that human to the corresponding situations. So, the human’s behavior cannot be generated without conscious experience after all. (See https://www.lesswrong.com/posts/k6EPphHiBH4WWYFCj/gazp-vs-glut for an example of this argument being made.)

    I believe this argument fails as a defense of behaviorism in the philosophy of mind. While it might be true that (other than by very unlikely coincidences) we cannot create a GLUT-human without human conscious experience, behaviorism would still give arguably incorrect judgments of whether some system is conscious. E.g., imagine that we took one human, let her face every possible situation once, and then create 100 behaviorally equivalent GLUTs based on our records. The behaviorist would see 101 conscious beings, while we would usually see only one. The behaviorist may therefore, e.g., have a different ethical view toward harming the GLUTs.

    I believe that the given argument is more successful as a defense of behaviorism as an approach to BPB. If the argument succeeds to establish that the a GLUT-agent can only be created using a “real agent”, then it does not matter much for one’s decisions whether one identifies with the GLUT-agent or only with the real agent. Therefore, even if we regard the judgment that the GLUT is implementing the agent’s algorithm as “wrong”, behaviorism about BPB stands. (Though perhaps the argument still hints at other counter-arguments that may succeed. E.g., the “different sorting algorithm” point is similar.)

    Another problem for the above defense of behaviorism in the philosophy of mind is that of randomly generated GLUTs. Assume that a randomly generated GLUT, by coincidence, behaved like a human. According to behaviorism, this GLUT would be conscious. The behaviorist view about BPB outlined in the post nicely takes care of these cases. Consider that the behaviorist approach identifies an algorithm with an aspect of a world model only if there is some reason (e.g., a proof) to suspect that the two are equivalent. But if the GLUT is really only behaviorally equivalent “by coincidence” that means exactly that without knowing how the GLUT turns out, there is no reason to suspect the equivalence. Thus, without finding out the content of the GLUT, an agent using the behaviorist approach to BPB will not identify with that GLUT. But once the agent knows the content of the GLUT, the agent will lack “control via logical correlation” over that GLUT.

    This is an example which illustrates how different the “solution criteria” for BPB and the mind-body problem are. Behaviorism may make weird judgements about what agents implement what algorithms, but when used as a solution to the BPB problem, this often leads to reasonable behavior anyway. But in the mind-body problem, weird judgments about what physical entity implements what algorithm already suffices to discard a view.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s