Wireheading

Some of my readers may have heard of the concept of wireheading:

Wireheading is the artificial stimulation of the brain to experience pleasure, usually through the direct stimulation of an individual’s brain’s reward or pleasure center with electrical current. It can also be used in a more expanded sense, to refer to any kind of method that produces a form of counterfeit utility by directly maximizing a good feeling, but that fails to realize what we value.

From my experience, people are confused about what exactly wireheading is and whether it is rational to pursue or not, so before I discuss some potentially new thoughts on wireheading in the next post, I’ll elaborate on that definition a bit and give a few examples.

Let’s say your only goal is to be adored by as many people as possible for being a superhero. Then thinking that you are such a superhero would probably be the thing that makes you happy. So, you would probably be happy while playing a superhero video game that is so immersive that while playing you actually believe that you are a superhero and forget dim reality for a while. So, if you just wanted to be happy or feel like a superhero you would play this video game a lot given that it is so difficult to become a superhero in real life. But this isn’t what you want! You don’t want to believe that you’re a superhero. You want to be a superhero. Playing the video game does not help you to attain that goal, instead push-ups and spinach (or, perhaps, learning about philosophy, game theory and theoretical computer science) help you to be a superhero.

So, if you want to become a superhero, fooling yourself into believing that you are a superhero obviously does not help you. It even distracts you. In this example, playing the video game was an example of wireheading (in the general sense) that didn’t even require you to open your skull. You just had to stimulate your sensors with the video game and not resist the immersive video game experience. The goal of being a superhero is an example of a goal that refers to the poutside world. It is a goal that cannot be achieved by changing your state of mind or your beliefs or the amount of dopamine in your brain.

So, the first thing you need to know about wireheading is that if your goals are about the outside world, you need to be irrational or extremely confused or in a very weird position (where you are paid to wirehead, for example) to do it. Let me repeat (leaving out the caveats): If your utility function assigns values to states of the world, you don’t wirehead!

What may be confusing about wireheading is that for some subset of goals (or utility functions), wireheading actually is a rational strategy. Let’s say your goal is to feel (and not necessarily be) important like a superhero. Or to not feel bad about the suffering of others (like the millions of fish which seem to die a painful death from suffocation right now). Or maybe your goal is actually to maximize the amount of dopamine in your brain. For such agents, manipulating their brain directly and instilling false beliefs in themselves can be a rational strategy! It may look crazy from the outside, but according to their (potentially weird) utility functions, they are winning.

There is a special case of agents whose goals refer to their own internals, which is often studied in AI: reinforcement learners. These agents basically have some reward signal which they aim to maximize as their one and only goal. The reward signal may come from a module in their code which has access to the sensors. Of course, AI programmers usually don’t care about the size of the AI’s internal reward numbers but instead use the reward module of the AI as a proxy for some goals the designer wants to be achieved (world peace, the increased happiness of the AI’s users, increased revenue for HighDepthIntellect Inc. …). However, the reinforcement learning AI does not care about these external goals – it does not even necessarily know about them, although that wouldn’t make a difference. Given that the reinforcement learner’s goal is about its internal state, it would try to manipulate its internal state towards higher rewards if it gets the chance no matter whether this correlates with what the designers originally wanted. One way to do this would be to reprogram its reward module, but assuming that the reward module is not infallible, a reward-based agent could also feed its sensors with information that leads to high rewards even without achieving the goals that the AI was built for. Again, this is completely rational behavior. It achieves the goal of increasing rewards.

So, one reason for confusion about wireheading is that there actually are goal systems under which wireheading is a rational strategy. Whether wireheading is rational depends mainly on your goals and given that goals are different from facts the question of whether wireheading is good or bad is not purely a question of facts.

What makes this extra-confusing is that the goals of humans are a mix between preferences regarding their own mental states and preferences about the external world. For example, I have both a preference for not being in pain but also a preference against most things that are causing the pain. People enjoy fun activities (sex, taking drugs, listening to music etc.) for how it feels to be involved in them, but they also have a preference for a more just world with less suffering. The question “Do you really want to know?” is asked frequently and it’s often unclear what the answer is. If all of your goals were about the outside world and not your state of mind, you would (usually) answer such questions affirmatively – knowledge can’t hurt you, especially because on average any piece of evidence can’t “make things worse” than you expected things to be before receiving that piece of evidence. Sometimes, people are even confused about why exactly they engage in certain activities and specifically about whether it is about fulfilling some preference in the outside world or changing one’s state of mind. For example, most who donate to charity think that they do it to help kids in Africa, but many also want the warm feelings from having made such a donation. And often, both are relevant. For example, I want to prevent suffering, but I also have a preference for not thinking about specific instances of suffering in a non-abstract way. (This is partly instrumental, though: learning about a particularly horrific example of suffering often makes me a lot less productive for hours. Gosh, so many preferences…)

There is another thing which can make this even more confusing. Depending on my ethical system I may value people’s actual preference fulfillment or the quality of their subjective states (the former is called preference utilitarianism and the latter hedonistic utilitarianism). Of course, you can also value completely different things like the existence of art, but I think it’s fair to say that most (altruistic) humans value at least one of the two to a large extent. For a detailed discussion of the two, consider this essay by Brian Tomasik, but let’s take a look at an example to see how they differ and what the main arguments are. Let’s say, your friend Mary writes a diary, which contains information that is of value to you (be it for entertainment or something else). However, Mary, like many who write a diary, does not want others to read the content of her diary. She’s also embarrassed about the particular piece of information that you are interested in. Some day you get the chance to read in her diary without her knowing. (We assume that you know with certainty that Mary is not going to learn about your betrayal and that overall the action has no consequences other than fulfilling your own preferences.) Now, is it morally reprehensible for you to do so? A preference utilitarian would argue that it is, because you decrease Mary’s utility function. Her goal of not having anyone know the content of her diary is not achieved. A hedonistic utilitarian would argue that her mental state is not changed by your action and so she is not harmed. The quality of her life is not affected by your decision.

This divide in moral views directly applies to another question of wireheading: Should you assist others in wireheading or even actively wirehead other agents? If you are a hedonistic utilitarian you should, if you are a preference utilitarian you shouldn’t (unless the subject’s preferences are mainly about her own state of mind). So, again, whether wireheading is a good or a bad thing to do is determined by your values and not (only) by facts.

2 thoughts on “Wireheading”

Brian Tomasik

Nice post. 🙂

> There is a special case of agents whose goals refer to their own internals, which is often studied in AI: reinforcement learners.

It’s worth pointing out that this doesn’t mean all RL agents wirehead. (Humans are quasi-RL agents who often don’t, even when given the opportunity via drugs, etc.) For instance, if the reward-generating subroutine is based on ever-improving beliefs about the external world and is protected against tampering, then (perhaps) an RL agent wouldn’t wirehead.

> We assume that you know with certainty that Mary is not going to learn about your betrayal

This assumption is often true in the case of dead people, which makes the moral conundrum more than a theoretical exercise. For example, consider the title of this article: “The Letters That Warren G. Harding’s Family Didn’t Want You to See”, http://www.nytimes.com/2014/07/13/magazine/letters-warren-g-harding.html (Of course, one can complain that release of the documents also affects Harding’s great-… grandchildren, society at large, etc.)

LikeLiked by 1 person

July 19, 2016 at 1:58 Reply
Caspar

In a private conversation Kaj Sotala ( http://kajsotala.fi/ ) drew my attention to Douglas Lenat’s EURISKO as an example of a wireheading AI. From http://web.archive.org/web/20090308022238/http://www.aliciapatterson.org/APF0704/Johnson/Johnson.html :

>Sometimes a “mutant” heuristic appears that does little more than continually cause itself to be triggered, creating within the program an infinite loop. During one run, Lenat noticed that the number in the Worth slot of one newly discovered heuristic kept rising, indicating that Eurisko had made a particularly valuable find. As it turned out the heuristic performed no useful function. It simply examined the pool of new concepts, located those with the highest Worth values, and inserted its name in their My Creator slots.

EURISKO wasn’t really an RL system, but still fell victim to a kind of wireheading, because it still intrinsically valued properties of its internals.

He also made another really good point, which is that all AIs view the outside world through their internal model such that if they accidentally wirehead into having a more optimistic model of the world, this action will seem to have been a really good idea.

LikeLike

August 12, 2016 at 0:34 Reply

	Eliezer Yudkowsky on The lack of performance metric…
	Lukas Finnveden on “Betting on the Past” by Arif…
	Jesse Clifton on Decision Theory and the Irrele…
	Lukas Finnveden on Cooperative AI competitions wi…
	Caspar on Cooperative AI competitions wi…

The Universe from an Intentional Stance

Wireheading

2 thoughts on “Wireheading”

Leave a comment Cancel reply

Teilen mit:

2 thoughts on “Wireheading”

Leave a comment Cancel reply