Suicide as wireheading

In my last post you learned about wireheading. In this post, I’ll bring to attention a specific example of wireheading: suicide or, as one would call it for robots, self-destruction. What differentiates self-destruction from more commonly discussed forms of wireheading is that it does not lead to a pleasurable or very positive internal state, but it too is a measure that does not solve problems in the external world as much as it changes one’s state of mind (into nothingness).

Let’s consider the reinforcement learner who has an internal module which generates rewards between -1 and 1. Now, there will probably be situations in which the reinforcement learner has to expect to mainly receive negative rewards in the future. Assuming that zero utility is assigned to nonexistence (similar to how humans think of the states prior to their existence or phases of sleep as neutral to their hedonic well-being) a reinforcement learner may well want to end its existence to increase its utility. It should be noted that typical reinforcement learners as studied in the AI literature don’t have a concept of death. However, a reinforcement learner that is meant to work in the real world would have to think about its death in some way (which may be completely confused by default). An example view is one in which the “reinforcement learner” actually has a utility function that calculates utility from the sum of the outputs of the physical reward module. In this case, suicide would reduce the number of outputs of the reward module which can increase expected utility if rewards are expected to be more often or to a stronger extent negative in the future. So, we have another example of potentially rational wireheading.

However, for many agents self-destruction is irrational. For example, if my goal is to reduce suffering then I may feel bad once I learn about instances of extreme suffering and about how much suffering there is in the world. Killing myself ends my bad feeling, but prevents me from achieving my goal of reducing suffering. Therefore, it’s irrational given my goals.

The actual motivations of real-world people committing suicide seem a lot more complicated most of the time. Many instances of suicide do seem to be about bad mental states. Also, many attempt to use their death to achieve goals in the outside world as well, the most prominent example being seppuku (or harakiri), in which suicide is performed to maximize one’s reputation or honor.

As a final note, looking at suicide through the lens of wireheading provides one way of explaining why so many beings who live very bad lives don’t commit suicide. If an animal’s goals are things like survival, health, mating etc. that correlate with reproductive success, the animal can’t achieve its goals by suicide. Even if the animal expects its life to continue in an extremely bad and unsuccessful way with near certainty, it behaves rationally if it continues to try to survive and reproduce rather than alleviate its own suffering. Avoiding pain is only one of the goals of the animal and if intelligent, rational agents are to be prevented from committing suicide in a Darwinian environment, reducing their own pain better not be the dominating consideration. In sum, the fact that an agent does not commit suicide tells you little about the state of its well-being if it has goals about the outside world, which we should expect to be the case for most evolved beings.

Wireheading

Some of my readers may have heard of the concept of wireheading:

Wireheading is the artificial stimulation of the brain to experience pleasure, usually through the direct stimulation of an individual’s brain’s reward or pleasure center with electrical current. It can also be used in a more expanded sense, to refer to any kind of method that produces a form of counterfeit utility by directly maximizing a good feeling, but that fails to realize what we value.

From my experience, people are confused about what exactly wireheading is and whether it is rational to pursue or not, so before I discuss some potentially new thoughts on wireheading in the next post, I’ll elaborate on that definition a bit and give a few examples.

Let’s say your only goal is to be adored by as many people as possible for being a superhero. Then thinking that you are such a superhero would probably be the thing that makes you happy. So, you would probably be happy while playing a superhero video game that is so immersive that while playing you actually believe that you are a superhero and forget dim reality for a while. So, if you just wanted to be happy or feel like a superhero you would play this video game a lot given that it is so difficult to become a superhero in real life. But this isn’t what you want! You don’t want to believe that you’re a superhero. You want to be a superhero. Playing the video game does not help you to attain that goal, instead push-ups and spinach (or, perhaps, learning about philosophy, game theory and theoretical computer science) help you to be a superhero.

So, if you want to become a superhero, fooling yourself into believing that you are a superhero obviously does not help you. It even distracts you. In this example, playing the video game was an example of wireheading (in the general sense) that didn’t even require you to open your skull. You just had to stimulate your sensors with the video game and not resist the immersive video game experience. The goal of being a superhero is an example of a goal that refers to the poutside world. It is a goal that cannot be achieved by changing your state of mind or your beliefs or the amount of dopamine in your brain.

So, the first thing you need to know about wireheading is that if your goals are about the outside world, you need to be irrational or extremely confused or in a very weird position (where you are paid to wirehead, for example) to do it. Let me repeat (leaving out the caveats): If your utility function assigns values to states of the world, you don’t wirehead!

What may be confusing about wireheading is that for some subset of goals (or utility functions), wireheading actually is a rational strategy. Let’s say your goal is to feel (and not necessarily be) important like a superhero. Or to not feel bad about the suffering of others (like the millions of fish which seem to die a painful death from suffocation right now). Or maybe your goal is actually to maximize the amount of dopamine in your brain. For such agents, manipulating their brain directly and instilling false beliefs in themselves can be a rational strategy! It may look crazy from the outside, but according to their (potentially weird) utility functions, they are winning.

There is a special case of agents whose goals refer to their own internals, which is often studied in AI: reinforcement learners. These agents basically have some reward signal which they aim to maximize as their one and only goal. The reward signal may come from a module in their code which has access to the sensors. Of course, AI programmers usually don’t care about the size of the AI’s internal reward numbers but instead use the reward module of the AI as a proxy for some goals the designer wants to be achieved (world peace, the increased happiness of the AI’s users, increased revenue for HighDepthIntellect Inc. …). However, the reinforcement learning AI does not care about these external goals – it does not even necessarily know about them, although that wouldn’t make a difference. Given that the reinforcement learner’s goal is about its internal state, it would try to manipulate its internal state towards higher rewards if it gets the chance no matter whether this correlates with what the designers originally wanted. One way to do this would be to reprogram its reward module, but assuming that the reward module is not infallible, a reward-based agent could also feed its sensors with information that leads to high rewards even without achieving the goals that the AI was built for. Again, this is completely rational behavior. It achieves the goal of increasing rewards.

So, one reason for confusion about wireheading is that there actually are goal systems under which wireheading is a rational strategy. Whether wireheading is rational depends mainly on your goals and given that goals are different from facts the question of whether wireheading is good or bad is not purely a question of facts.

What makes this extra-confusing is that the goals of humans are a mix between preferences regarding their own mental states and preferences about the external world. For example, I have both a preference for not being in pain but also a preference against most things that are causing the pain. People enjoy fun activities (sex, taking drugs, listening to music etc.) for how it feels to be involved in them, but they also have a preference for a more just world with less suffering. The question “Do you really want to know?” is asked frequently and it’s often unclear what the answer is. If all of your goals were about the outside world and not your state of mind, you would (usually) answer such questions affirmatively – knowledge can’t hurt you, especially because on average any piece of evidence can’t “make things worse” than you expected things to be before receiving that piece of evidence. Sometimes, people are even confused about why exactly they engage in certain activities and specifically about whether it is about fulfilling some preference in the outside world or changing one’s state of mind. For example, most who donate to charity think that they do it to help kids in Africa, but many also want the warm feelings from having made such a donation. And often, both are relevant. For example, I want to prevent suffering, but I also have a preference for not thinking about specific instances of suffering in a non-abstract way. (This is partly instrumental, though: learning about a particularly horrific example of suffering often makes me a lot less productive for hours. Gosh, so many preferences…)

There is another thing which can make this even more confusing. Depending on my ethical system I may value people’s actual preference fulfillment or the quality of their subjective states (the former is called preference utilitarianism and the latter hedonistic utilitarianism). Of course, you can also value completely different things like the existence of art, but I think it’s fair to say that most (altruistic) humans value at least one of the two to a large extent. For a detailed discussion of the two, consider this essay by Brian Tomasik, but let’s take a look at an example to see how they differ and what the main arguments are. Let’s say, your friend Mary writes a diary, which contains information that is of value to you (be it for entertainment or something else). However, Mary, like many who write a diary, does not want others to read the content of her diary. She’s also embarrassed about the particular piece of information that you are interested in. Some day you get the chance to read in her diary without her knowing. (We assume that you know with certainty that Mary is not going to learn about your betrayal and that overall the action has no consequences other than fulfilling your own preferences.) Now, is it morally reprehensible for you to do so? A preference utilitarian would argue that it is, because you decrease Mary’s utility function. Her goal of not having anyone know the content of her diary is not achieved. A hedonistic utilitarian would argue that her mental state is not changed by your action and so she is not harmed. The quality of her life is not affected by your decision.

This divide in moral views directly applies to another question of wireheading: Should you assist others in wireheading or even actively wirehead other agents? If you are a hedonistic utilitarian you should, if you are a preference utilitarian you shouldn’t (unless the subject’s preferences are mainly about her own state of mind). So, again, whether wireheading is a good or a bad thing to do is determined by your values and not (only) by facts.

Self-improvement races

Most of my readers are probably familiar with the problem of AI safety: If humans could create super-human level artificial intelligence the task of programming it in such a way that it behaves as intended is non-trivial. There is a risk that the AI will act in unexpected ways and given its super-human intelligence, it would then be hard to stop.

I assume fewer are familiar with the problem of AI arms races. (If you are, you may well skip this paragraph.) Imagine two opposing countries which are trying to build a super-human AI to reap the many benefits and potentially attain a decisive strategic advantage, perhaps taking control of the future immediately. (It is unclear whether this latter aspiration is realistic, but it seems plausible enough to significantly influence decision making.) This creates a strong motivation for the two countries to develop AI as fast as possible. This is the case especially if the countries would dislike a future controlled by the other. For example, North Americans may fear a future controlled by China. In such cases, countries would want to invest most available resources into creating AI first with less concern for whether it is safe. After all, letting the opponent win may be similarly bad as having an AI with entirely random goals. It turns out that under certain conditions both countries would invest close to no resources into AI safety and all resources into AI capability research (at least, that’s the Nash equilibrium), thus leading to an unintended outcome with near certainty. If countries are sufficiently rational, they might be able to cooperate to mitigate risks of creating uncontrolled AI. This seems especially plausible given that the values of most humans are actually very similar to each other relative to how alien the goals of a random AI would probably be. However, given that arms races have frequently occurred in the past, a race toward human-level AI remains a serious worry.

Handing power over to AIs holds both economic promise and a risk of misalignment. Similar problems actually haunt humans and human organizations. Say, a charity hires a new director who has been successful in other organizations. Then this creates the opportunity for the charity to rise in influence. However, it is also possible that the charity changes in a way that the people currently or formerly in charge wouldn’t approve of. Interestingly, the situation is similar for AIs which create other AIs or self-improve themselves. Learning and self-improvement are the paths to success. However, self-improvements carry the risk of affecting the goal-directed behavior of the system.

The existence of this risk seems true prima facie: It should be strictly easier to find self-improvements that “probably” work than it is to identify self-improvements that are guaranteed to work. The former is a superset of the latter. So, AIs which are willing to take risks while self-improving can improve faster. (ETA: Cf. page 154 of Max Tegmark’s 2017 book Life 3.0 (published after this blog post was originally published).)

There are also formal justifications for the difficulty of proving self-improvements to be correct. Specifically, Rice’s theorem states that for any non-trivial property p, there is no way of deciding for all programs whether they have this property p. (If you know about the undecidability of the halting problem, Rice’s theorem follows almost immediately from that.) As a special case, deciding for all programs whether they are pursuing some goals is impossible. Of course, this does not mean that proving self-improvements to be correct is impossible. After all, an AI could just limit itself to the self-improvements that it can prove correct (see this discussion between Eliezer Yudkowsky and Mark Waser). However, without this limitation – e.g., if it can merely test some self-improvement empirically and implement it if it seems to work – an AI can use a broader range of possible self-modifications and thus improve more quickly. (In general, testing a program also appears to be a lot easier than formally verifying it, but that’s a different story.) Another relevant problem from (provability) logic may be Löb’s theorem which roughly states that a logical system with Peano arithmetic can’t prove another logical mechanism with that power to be correct.

Lastly, consider Stephen Wolfram’s more fuzzy concept of computational irreducibility. It basically states that as soon as a system can produce arbitrarily complex behavior (i.e., as soon as it is universal in some sense), predicting how most aspects of the system will behave becomes fundamentally hard. Specifically, he argues that for most (especially for complex and universal) systems, there is no way to find out how they behave other than running them.

So, self-improvement can give AIs advantages and ultimately the upper hand in a conflict, but if done too hastily, it can also lead to goal drift. Now, consider the situation in which multiple AIs compete in a head-to-head race. Based on the above considerations this case becomes very similar to the AI arms races between groups of humans. Every single AI has incentives to take risks to increase its probability of winning, but overall this can lead to unintended outcomes with near certainty. There are reasons to assume that this self-improvement race dynamic will be more of a problem for AIs than it is for human factions. The goals of different AIs could diverge much more strongly than the goals of different humans. Whereas human factions may prefer the enemy’s win over a takeover by an uncontrolled AI, an AI with human values confronting an AI with strange values has less to lose from risky self-modifications. (There are some counter-considerations as well. For instance, AIs may be better at communicating and negotiating compromises.)

Thus, a self-improvement race between AIs seems to share the bad aspects of AI arms races between countries. This has a few implications:

  • Finding out (whether there is) a way for AIs to cooperate and prevent self-improvement races and other uncooperative outcomes becomes more important.
  • Usually, one argument for creating AI and colonizing space is that Earth-originating aligned AI could prevent other, less compassionate AIs (uncontrolled or created by uncompassionate ETs) from colonizing space. So, according to this argument, even if you don’t value what humans or human-controlled AIs would do in space, you should still choose it as the lesser of two (or more) evils. However, the problem of self-improvement races puts this argument into question.
  • On a similar note, making the universe more crowded with AIs, especially ones with weird (evolutionary uncommon) values or ones that are not able to cooperate, may be harmful as it could lead to results that are bad for everyone (except for the AI which is created in a self-modification gone wrong).

Acknowledgment: This work was funded by the Foundational Research Institute (now the Center on Long-Term Risk).

Mathematical versus moral truth

This piece was inspired by a discussion between Magnus Vinding and Brian Tomasik, as well as a private discussion between Magnus Vinding and myself.

Some philosophers say that moral claims like killing is bad can be proven true or false. This position is called moral realism and according to a recent PhilPapers survey on the views of philosophers, it is a very common view among philosophers. It’s important to understand that moral realists not only claim truths in terms of some underlying assumptions! They don’t just say “If our goal is to reduce suffering, then torturing squirrels is objectively bad.” (In Kant’s terminology this would be called a hypothetical imperative by the way.) They claim that the goal of reducing suffering itself or some other goal can be true or false by itself.

One particular comparison that pops up in discussions of moral realism is that between morality and mathematics. After all, there are absolute (“a priori”) truths without assumptions in mathematics, right? Well…

Truth in mathematics

Until the beginning of the 20th century, most mathematicians probably would have argued that mathematics finds absolute truth without making assumptions. (Interestingly, Greek mathematician Euclid had axiomatized his geometry some two thousand years ago. However, the axioms were seen as “undoubtably/obviously true” without much further thought. Nevertheless, the idea of axiomatization, i.e. basing mathematics on assumptions, was not completely unknown.) But then mathematics had a foundational crisis mainly about problems like Russel’s paradox, which showed that you can easily run into some serious problems, if the foundations of mathematics aren’t studied in a rigorous manner and judgment of correctness of inference is based solely on “intuitive obviousness”. (An extremely accessible and overall great introduction to the topic is the graphic novel Logicomix.) To get rid of contradictions in mathematics, people tried to systematize its foundations. It turned out that the most solid way to go was explicit axiomatization: assuming a small set of statements (called axioms) that all the rest of mathematics can be deduced from. Luckily, a very small set of pretty basic assumptions called Zermelo-Fraenkel set theory is enough for deducing all of mathematics. And axiomatization is still the standard way to talk about truth in mathematical logic. (There are, however, different philosophical interpretations, some of them saying that the axioms are just undoubtedly true.)

Note that when some axiom system implies 2+2=4, this very statement of axioms => 2+2=4 is still not true in an absolute sense as it requires definitions of what all the symbols in that statement mean. And these definitions can only be made when using other symbols, for example those of natural language. (And the higher level proof of axioms => 2+2=4 can only be made by assuming some properties about these meta-symbols.) At some meta-…-meta-level we have to stop with a set of statements/definitions that just have to be assumed true and understood by everyone, or we have an infinite regress. (Humans usually stop at the language level, which is somehow obvious to everyone.) Or we get cycles (“a set is a collection” and “a collection is a set”). That’s the Münchhausen trilemma.

Maybe more importantly, axioms => 2+2=4 is not really the statement that we want to be absolutely true. We want 2+2=4 without underlying assumptions.

So, how did people do mathematics before axioms, then? How did they manage to avoid finding underlying axioms? Well, they always assumed some things to be obvious and did not require further justification. And these things were not even on the level of axioms, yet, but more like commutativity (n+m=m+n, e.g. 1+2=2+1). So, when proving some mathematical hypothesis, they wrote things like n+m=K and therefore also m+n=K and people didn’t ask why, because commutativity of addition is obvious. This is still how it is today: People know that if you continue to ask why they will end up at something like the Zermelo-Fraenkl axioms. And even if they don’t view the Zermelo-Frankl axioms as the foundation of mathematics, almost everyone agrees about commutativity of addition and basic arithmetics in general, whether generated by the Zermelo-Fraenkl or the Peano axioms.

So, this sounds terrible. How do we know that 2+2=4 if all of this is based on axioms that can’t be proven? Well, for one, mathematics is mainly relevant as a model of reality, which can be used to make testable predictions: if we take two separable things and add another two separable things then we will have four things. (That does not work for clouds, you see.) If we had an axiom system in which the natural definition of addition makes 2+2=5 true, then this addition would not be useful for predicting the number of things you have in the end. So, that’s one way to justify the use of Zermelo-Fraenkl: as a gold mine of simple models that can be used to make correct predictions.

There’s another silver lining. The sets of axioms that are usually used are extremely simple and basic. For example, one of the Zermelo-Fraenkl axioms states that if you have a couple of sets, then there is another set which contains all the elements of the sets. While it is philosophically troubling that proofs can’t verify “absolute truths” without some assumptions, it’s obviously not very productive to ask: “But what if the union of two sets really does not exist?” Just like it is not very productive to put into question the rules of logic and probability theory. Without assuming such “rules of thought”, it is not even clear how to think about doubting them! (Thanks to Magnus Vinding for pointing that out to me.) So, some things must be assumed if you don’t want your brain to explode.

Also, some of the axioms can be seen as so basic that they merely define the notion of sets. So, you may view the axioms more as definitions of what we are talking about than as some assumptions about truth.

So, some set of assumptions (or at least a common understanding of symbols) is simply necessary for doing any useful mathematics, or any useful formal reasoning for that matter. (Similarly, you need probability theory and some prior assumptions to extract knowledge from experience. And according to the no free lunch theorems of induction, if you choose the “trivial prior” of assuming all possible models to have the same probability you won’t be able to learn from experience either. So, you need something like Occam’s razor, but that’s another story…)

There is not much wrong with calling mathematical theorems “true” instead of “true assuming the axioms of … with the underlying logical apparatus …”, because the meaning does not vary greatly among different people. Most people agree about assuming these axioms or some other set of axioms that is able to produce the same laws of arithmetic. And there is also a level on which everybody can understand the axioms in the same way, at least functionally.

(Interestingly, there are exceptions to universal agreement on axioms in mathematics. The continuum hypothesis and its negation are both consistent with the Zermelo-Fraenkl axioms (assuming they themselves are consistent). This has led to some debate about whether one should treat the continuum hypothesis as true or false.)

Truth in ethics

So, let’s move to ethics and how it can be justified. The first thing to notice, I think, is that making correct claims in ethics requires additional assumptions or at least terminology. Deducing claims about what “ought to be” or what is “good” or “bad” requires new assumptions or definitions about this “oughtness”. It’s not possible to just introduce a new symbol or term and prove statements about it without knowing something (like a definition) about it. (This is probably what people mostly talk about when they say that moral realism is false: Hume’s is-ought-gap which has never been argued against successfully.)

But one needs to be able to make normative statements, i.e. statements about goals. Acting without a goal is like trying to judge the validity of a string of symbols without knowing the meaning of the symbols (or like making predictions without prior assumptions): not very sensible. Without goals, there is no reason to prefer one action over another. So, it does make sense to introduce some assumptions about goals into our set of “not-really-doubtable assumptions”.

Here are a few options that people commonly choose:

  • using the moral intuition black box of your brain;
  • using a set of rules on a meta-level, like “prefer simple ethical systems”, “ethical systems should be in the interest of everyone”, “ethical imperatives should not refer in any way to me, to any ethnic group or species” etc.;
  • using a lot of different, non-coherent (“object-level”) rules, like the ten commandments, the sharia, the golden rule, egoism etc., and deciding on specific questions by some messy majority vote system;
  • using some specific ethical imperative like preference utilitarianism, the ten commandments + egoism, coherent extrapolated volition etc.

(Note that this classification is non-rigorous. For example, there is not necessarily a distinction between rules and meta-rules. Coherent extrapolated volition can be used in the imperative “fulfill the coherent extrapolated volition of humankind on September 26, 1983” or in the meta-level rule “use the moral system that is the result of applying coherent extrapolated volition to the version of humankind on September 26, 1983”.  Also, most people will probably keep at least some of their intuition black box and not override it entirely with something more transparent. Maybe, some religious people do. Eliezer Yudkowsky would argue that affective death spirals can also lead non-religious people to let an ethical system override their intuition.)

Diverging assumptions

What strikes me as the main difference between assuming things in the foundations of mathematics and assuming some foundations of (meta-)ethics, is that there are many quite different sets of assumptions about oughtness and all of them don’t make your brain explode. It’s extremely helpful to have mathematical theories of arithmetic (or at least geometry) to produce useful/powerful models. And as soon as you have this basic level of power, you can pretty much do anything, no matter whether you reached arithmetic directly or via Zermelo-Fraenkl. Without the power of arithmetic, geometry or something like that, an axiomatic system goes extinct, at least when it comes to “practical” use in modeling reality. (Similar things apply to choosing prior probability distributions. In learning from observations, you can, in principle, use arbitrary priors. For instance, you could assume all hypotheses about the data that start with a “fu” to be very unlikely a priori. But there is strong selection pressure against crazy priors. Therefore, evolved beings will tend to find them unattractive. It’s difficult to imagine how the laws of probability theory can be false…)

For ethical systems the situation is somewhat different. Most people base their moral judgments on their intuition. However, moral intuition varies from birth and continues to change through external influences. There does not seem to exist agreement on meta-level rules, either. For example, many people prefer simple ethical systems, while others put normative weight on complexity of human value. People disagree about object-level ethical systems: there are different consequentialist systems, deontology and virtue ethics. And there are plenty of variations of each of these systems, too. And, of course, people disagree a lot on specific problems: the moral (ir)relevance of non-human animals, the (il)legitimacy of abortion, whether punishment of wrongdoers has intrinsic value, etc. So, people don’t really disagree on the fundamentals of mathematics, but they fundamentally disagree on ethics.

And that should not surprise us. Humans have evolved to be able to work towards goals and to learn about abstract concepts. Arithmetics is an abstract concept that is useful for attaining the goals that humans tend to have. And thus, in a human-dominated society, the meme of arithmetics (and axiomatic systems that are sufficient for arithmetic) will spread.

I can’t identify similar reasons for one ethical system to be more popular than all others. Humans should have evolved to be “selfish” (in the sense of doing things that were helpful for spreading genes in our ancestral environment) and they are to some extent (though, in modern societies, humans aren’t very good at spreading genes anymore). But selfishness is not a meme that is likely to go viral: there are few reasons for a selfish person to tell someone else (other than maybe their close kin) that they should be selfish. (Some people do it anyway, but that may be about signaling…) So, one should expect that meme to not spread very much. Including relatives and friends into moral consideration is a meme much more virulent than pure egoism. People tend to raise their children that way to stop them from exploiting their parents and from competing too hard with their brothers and sisters. Generosity and mutual cooperation among friends is also an idea that makes sense to share with your friends. Life in larger society makes it important to cooperate with strangers especially if they pay taxes to the same government that you do.

But as opposed to these somewhat ethical notions that can be selfish to spread, most people tend to think of ethics as giving without expecting something in return, which is not favored by evolution directly. So, while we should expect the notions of egoism, and cooperation with kin, friends and even members of one’s society (potentially at the risk of being a “sucker” sometimes) to be universal, there is much less direct selection pressure towards, say, helping animals. Explaining “real altruism” (not cooperation, signaling etc.) with evolutionary psychology is beyond this post, but I suspect that without strong selection pressures in any specific direction, there is not much reason to assume that there is much homogeneity. (The continuum hypothesis can be seen as a precedent for that in the realm of mathematics.)

Diverging assumptions make moral “truth” relative in a practical sense. A moral claim can be true on the set of assumptions held by many people and wrong on another set of assumptions also held by many people. So, instead of saying “true” one should at least say “true assuming hedonistic utilitarianism” or “true assuming that simple ethical systems are to be preferred, ethics should be universalizable and so on”. (Unless, of course, you are talking within a group with the same assumptions…)

Lack of precision

Another difference between ethics and logic is that ethical statements are much less precise than mathematical ones. Even before the foundations of mathematics were laid in the form of axiom systems and logical rules of deduction, there was little disagreement on what mathematical statements meant. That’s not the case for ethics. Two people can agree on killing is wrong and disagree on meat consumption, the death penalty and abortion, anyway, because they mean “killing” in different ways. People can agree on ethics being about all sentient beings and have the same knowledge about pigs and still disagree on whether they are ethically relevant, because they don’t agree on what sentience is.

Lack of precision makes moral truth ill-defined for most people. If your goal is to reduce suffering and you don’t have a precise definition of what suffering is, then you can’t actually prove that one set of consequences contains less suffering than another even if it is intuitively clear. If you have a plain intuition box, things are even worse. And thus one problem in normative ethics seems to be that few people are able to say what kind of argument would convince them to change their mind. (In the empirical sciences, statements are frequently blurry, too, of course, but usually much less so – scientists usually are able to make falsifiable claims.)

Conclusion

With moral “truth” being relative and usually ill-defined, I don’t think the term “truth” is very helpful or appropriate anymore, unless you add assumptions (as a remedy to relativity) that are at least somewhat precise (to introduce well-definedness).

As a kind of summary, here are the main ways in which moral realists could disagree with this critique:

  1. “Truth in mathematics (and the principles of empirical science, e.g. Occam’s razor, Bayesian probability theory) is absolute. At some level there is a statement that requires neither further definitions nor other assumptions. This makes it possible to also prove things like 2+2=4 without underlying assumptions, where 2,+, = and 4 can be translated into other things that don’t require further definition.”
  2. “The trilemma arguments about mathematical truth don’t apply to moral truth. While mathematical truth requires some underlying assumptions, moral truth does not require underlying assumptions. Or at least no further assumptions about ‘oughtness’.”
  3. “There is some core of assumptions about ethics that everybody actually shares in one variation or another similar to the way people share arithmetics and these can also be turned into precise statements (to everyone’s agreement) that make moral truth well-defined. With derivations from these underlying assumptions, most people could be convinced of, for example, some specific view on animal consciousness.”
  4. “Even though people disagree about the assumptions and/or their conceptions of ethics are non-rigorous and therefore nontransparent, speaking about ‘truth’ (without stating underlying assumptions) can somehow be justified.”

As I have written elsewhere, I am not entirely pessimistic about reaching a moral consensus of the sort that could warrant calling moral claims true in the sense that objection 3 proposes. I think there are some moral rules and meta-rules that pretty much everybody agrees on: the golden rule, morality being about treating others, etc. And many of them seem to be not completely blurry. At least, they are precise enough to warrant the judgment that torturing humans is, other things being equal, bad. Works like that of Mayank and Daswani and myself show that concepts related to morality can be formulated rigorously. And Cox’s theorem, as discussed and proved in the first chapters of E.T. Janes’ book on probability theory, even derives the axioms of probability from qualitative desiderata. Maybe, though I am not sure about this, there is a set of axioms that many people could agree on and that is sufficient for deriving a moral system in a formal fashion.

There are some models of this in the realm of ethics. For example, Harsanyi proved utilitarianism (though without actually defining welfare) under some (pretty reasonable) assumptions and Morgenstern and von Neumann proved consequentialism under another set of very reasonable assumptions. Unfortunately, somebody has yet to come up with an axiomatization of moral intuitions that is convincing to more than just a few people…

Acknowledgements

Adrian Hutter made me aware of an error in an earlier version of this post.

Dale Carnegie on hedonic set points

The hedonic set point is a level of happiness to which humans tend to return to throughout their lives, even after dramatic events like winning the lottery or becoming paraplegic. It is a relevant concept throughout effective altruism: knowledge of hedonic set points (like other aspects of neuroscience) informs efforts to improve the well-being of at least humans and potentially also other mammals or even evolved creatures in general. It is also an argument for wealthy people to give large amounts of money to charity. Beyond a certain point having more money will not make you significantly happier, so it’s natural to give it away to where it can make a big difference.

In chapter 2 of part 2 of his famous classic How to Win Friends and Influence People (also recommended to effective altruists in general), Dale Carnegie writes on the idea that is now associated with hedonic set points:

Everybody in the world is seeking happiness – and there is one sure way to find it. That is by controlling your thoughts. Happiness doesn’t depend on outward conditions. It depends on inner conditions.

It isn’t what you have or who you are or where you are or what you are doing that makes you happy or unhappy. It is what you think about it. For example, two people may be in the same place, doing the same thing; both may have about an equal amount of money and prestige – and yet one may be miserable and the other happy. Why? Because of a different mental attitude. I have seen just as many happy faces among the poor peasants toiling with their primitive tools in the devastating heat of the tropics as I have seen in air-conditioned offices in New York, Chicago or Los Angeles.

“There is nothing either good or bad,” said Shakespeare[‘s Hamlet], “but thinking makes it so.”

Abe Lincoln once remarked that “most folks are about as happy as they make up their minds to be.” He was right. I saw a vivid illustration of that truth as I was walking up the stairs of the Long Island Railroad station in New York. Directly in front of me thirty or forty crippled boys on canes and crutches were struggling up the stairs. One boy had to be carried up. I was astonished at their laughter and gaiety. I spoke about it to one of the men in charge of the boys. “Oh yes,” he said, “when a boy realises that he is going to be a cripple for life, he is shocked at first; but after he gets over the shock, he usually resigns himself to his fate and then becomes as happy as normal boys.”

I felt like taking my hat off to those boys. They taught me a lesson I hope I shall never forget.

Cheating at thought experiments

Thought experiments are important throughout the sciences. For example, it appears to be that a thought experiment played an important role in Einstein’s discovery of special relativity. In philosophy and ethics and the theory of the mind in particular, thought experiments are especially important and there are many famous ones. Unfortunately, many thought experiments might just be ways of tricking people. Like their empirical counterparts, they are prone to cheating if they lack rigor and the reader does not try to reproduce (or falsify) the results.

In his book, Consciousness Explained, Daniel Dennett gives (at least) three examples of cheating in thought experiments. The first one is from chapter 9.5 and Dennett’s argument roughly runs as follows. After having described his top-level theory of the human mind, he addresses the question “Couldn’t something unconscious – a zombie, for instance – have [all this machinery]?” This argument against functionalism, computationalism and the like, is often accompanied by the following argument: “That’s all very well, all those functional details about how the brain does this and that, but I can imagine all that happening in an entity without the occurrence of any real consciousness.” To this Dennett replies: “Oh, can you? How do you know? How do you know you’ve imagined ‘all that’ in sufficient detail, and with sufficient attention to all the implications.”

With regard to another thought experiment, Mary, the color scientist, Dennett elaborates (ch. 12.5): “[Most people] are simply not following directions! The reason no one follows directions is because what they ask you to imagine is so preposterously immense, you can’t even try. The crucial premise […] is not readily imaginable, so no one bothers.”

In my opinion, this summarizes the problems with many thought experiments (specifically intuition pumps): Readers do not (most often because they cannot) follow instructions and are thus unable to mentally set up the premises of the thought experiment. And then they try to reach a conclusion anyway based on their crude approximation to the situation.

Another example is Searle’s Chinese Room, which Dennett covers in chapter 14.1 of his book. When Searle asks people to imagine a person that does not speak Chinese, but answers queries in Chinese using a large set of rules and a library, they probably think of someone looking up definitions in a lexicon or something. At least, this is feasible and also resembles the way people routinely pretend to have knowledge that they don’t. What people don’t imagine is the thousands of steps that it would take the person to compose even short replies (and choosing Chinese as a language does not help most English speaking readers to imagine the complexity of the procedure of composing a message). If they did simulate the entire behavior of the whole system (the Chinese room with the person in it), they might conclude that it has an understanding of Chinese after all. And thus, this thought experiment is not suitable for debunking the idea that consciousness can arise from following rules.

Going beyond what Dennett discusses in his book, I’d like to consider further thought experiments that fit the pattern. For example, people often argue that hedonistic utilitarianism demands that the universe be tiled with some (possibly very simple) object that is super-happy. Or at least that individual humans should be replaced this way. In an interview, Eliezer Yudkowsky said:

[A utilitarian superintelligence] goes out to the stars, takes apart the stars for raw materials, and it builds whole civilizations full of minds experiencing the most exciting thing ever, over and over and over and over and over again.

The whole universe is just tiled with that, and that single moment is something that we would find this very worthwhile and exciting to happen once. But it lost the single aspect of value that we would name boredom […].

And so you lose a single dimension, and the [worthwhileness of the universe] – from our perspective – drops off very rapidly.

This thought experiment is meant to prove that having pure pleasure alone is not a desirable result. Instead, many people endorse complexity of value – which is definitely true from a descriptive point of view – and describe in detail many good things that utopia should contain. While I have my own doubts about the pleasure-filled universe, my suspicion is that one reason why people don’t like it is that they don’t consider it for very long and  don’t actually imagine all the happiness. “Sure, some happiness is nice, but happiness gets less interesting when having large amounts of it.” The more complex scenario on the other hand can actually be imagined more easily and due to having different kinds of good stuff, one does not have to base judgment entirely on some number being very large. Closing the discussion of this example, I would like to remark that I am, at the time of writing this, not a convinced hedonistic utilitarian. (Rather I am inclined towards a more preferentist view, which, I feel, is in line with endorsing complexity of value and value extrapolation, though I am skeptical of preference idealization as proposed in Yudkowsky’s Coherent Extrapolated Volition. Furthermore, I care more about suffering than about happiness, but that’s a different story…) I just think that the universe filled with eternal bliss cannot be used as a very convincing argument against hedonistic utilitarianism. Similar arguments may apply to deciding whether currently, the bad things on earth are outweighed by the good things.

The way out of this problem of unimaginable thought experiments is to confine ourselves to thought experiments that are within our cognitive reach. Results may then, if possible, be extrapolated to the more complex situations. For example, I find it more fruitful to talk about whether I only care about pleasure in other individuals, or also about whether they are doing something that is very boring from the outside.

Notes on the 24 November 2015 conference on machine ethics

The day before yesterday, I attended a German-speaking conference on robo- and machine ethics, organized by the Daimler and Benz foundation and the Cologne center for ethics, rights, economics, and social sciences of health. Speakers included Prof. Oliver Bendel, author of a German-language blog on machine ethics and Norbert Lammert, president of the German Bundestag. The conference wasn’t meant to be for researchers only – though a great many scientists were present –, so most talks were of introductory nature. Ignoring the basics, which are, for example, covered in the collection on machine ethics by Anderson and Anderson and the book by Wallach and Allen, I will in the following summarize some thoughts regarding the event.

webvisual_roboterethik_150831
Poster from the conference website

Conservatism

Understandably, the conference focused on the short-term relevance and direct application of machine ethics (see below). Robots with human-level capabilities were only alluded to as science fiction. Nick Bostrom’s book Superintelligence was not even mentioned.

Having researched the speakers only a bit, it also seems like most of them have not cared to comment on such scenarios, before.

Immediately relevant fields for machine ethics

Lammert began his talk by saying that governments are usually led to change/introduce legislation when problems are urgent but not before. And thus, significant parts of the conference was dedicated to specific problems in machine ethics that robots face today or might face in the near future. The three main areas seem to be

  • robots in medicine and care of the elderly,
  • military robots, and
  • autonomous vehicles (also see Lin (2015) on why ethics matters for autonomous cars).

Lammert also argued that smart home applications might be relevant. Furthermore, Oliver Bendel pointed to some specific examples:

AIs and full moral agency

There was some agreement that AIs should (or could) not become full moral agents (at least within the foreseeable future). For example, upon being asked about the possibility of users programming robots to commit acts of terrorism, Prof. Jochen Steil argued that illegitimate usage can never really be ruled out and that the moral responsibility lies with the user. With full moral agency, however, robots could in principle resist any kind of illegal or immoral use. AI seems to be the only general purpose tool that can be made safe in this way and it seems odd to miss the chance to make use of this to increase the safety of this powerful technology.

In his talk, Oliver Bendel said that he was opposed to the idea of letting robots make all moral decisions. For example, he proposed the idea that robot vacuum cleaners could stop when coming across a bug or spider, but ultimately let the user whether to suck in the creature. Also, he would like cars to make him decide in ethically relevant situations. As some autonomous vehicle researchers from the audience pointed out (and Bendel himself conceded), this will not be possible in most situations – ethical problems lurk around every corner and quick reactions are required more often than not. In response to the question of why machines should not make certain crucial decisions, he argued that people and their lack of rationality was the problem. For example, if one were to introduce autonomous cars, people whose relatives would be killed in accidents by these vehicles would complain if the AI had chosen their relatives as victims even if the overall number of deaths was decreased by using autonomous vehicles. I don’t find this argument very convincing, though. It seems to be a descriptive point rather than a normative one: of course, it would be difficult for people to accept machines as moral agents, but that does not mean that machines should not make moral decisions. And the preferences that are violated, the additional unhappiness or the public outcry caused by introducing autonomous vehicles are morally relevant, but people dying (and therefore also more relatives being unhappy) is much more important and should be the priority.

Weird views on free will, consciousness and morality

Some of the speakers made comments on the nature of free will, consciousness and morality that surprised me. For example, Lammert said that morality necessarily had to be based on personal experience and reflection and that this made machine morality impossible in principle. Machines could only be “perfected to behave according to some external norms” and he said that this has nothing to do with morality and another speaker agreed.

Also, most speakers naturally assumed that machines of the foreseeable future don’t possess consciousness or free will, which I disagree with (see this article by Eliezer Yudkowsky on free will and Dan Dennett’s Consciousness explained or Brian Tomasik’s articles on consciousness). I am not so surprised about disagreeing with them, because many of the ideas of Yudkowsky and Tomasik would be considered “crazy” by most people (though not necessarily philosophers, I believe), but by how confident they are given that free will, consciousness and the nature of morality are still the subject of ongoing discussion in mainstream, contemporary philosophy. Indeed, digital consciousness seems to be a possibility in Daniel Dennett’s view on consciousness (see his book Consciousness explained) or Thomas Metzinger’s self-model theory of subjectivity (see, for example, The ego tunnel) and theories like computationalism in general. All of this is quite mainstream.

The best way out of this debate, in my opinion, is to only talk about the kind of morality that we really care about, namely “functional morality”, i.e. acting morally, but, if that is even possible, not thinking morally, feeling empathy etc. I don’t really think it matters much whether AIs are really consciously reflecting about things or whether they just act morally in some mechanic way and I expect most people to agree. I made a similar argument about consequentialism and machine ethics elsewhere.

I expect that machines themselves could become morally relevant and maybe some are already to some extent, but that’s a different topic.

AI politicians

Towards the end, Lammert was asked about politics being robosourced. While saying that he is certain that it will not happen within his life time (Lammert was born 1948), he said that politics will probably develop in this way unless explicitly prevented.

In the preceding talk, Prof. Johannes Weyer mentioned that real time data processing could be used for making political decisions.

Another interesting comment to Lammert’s talk was that many algorithms (or programs) basically act as laws in that they direct the behavior of millions of computers and thereby millions of people.

Overall, this leads me to believe that besides the application in robotic areas (see above), the morality of artificial intelligence could become important in non-embodied systems that make political or maybe management decisions.

Media coverage

Due to the presence of Norbert Lammert (president of the German Bundestag) and all the other high-profile researchers and based on the large fraction of media people on the list of attendees, I expect the conference will receive a lot of press coverage.

Utilitarianism and the value of a life

Utilitarians are often criticized for being cold-hearted in that they assign numbers to suffering and happiness (or number of lives saved) and for making ethical decisions based on calculations. With regard to this, Russell and Norvig write in the third edition of their famous artificial intelligence textbook (section 16.3.1):

Although nobody feels comfortable with putting a value on a human life, it is a fact that tradeoffs are made all the time. Aircraft are given a complete overhaul at intervals determined by trips and miles flown, rather than after every trip. Cars are manufactured in a way that trades off costs against accident survival rates. Paradoxically, a refusal to put a monetary value on life means that life is often undervalued. Ross Shachter relates an experience with a government agency that commissioned a study on removing asbestos from schools. The decision analysts performing the study assumed a particular dollar value for the life of a school-age child, and argued that the rational choice under that assumption was to remove the asbestos. The agency, morally outraged at the idea of setting the value of a life, rejected the report out of hand. It then decide against asbestos removal — implicitly asserting a lower value for the life of a child than that assigned by the analysts.

So, if one actually cares about a life not being destroyed, then the optimal approach is to assign a value to a life that is as accurate as possible. Deliberately not doing that makes sense only if you care more about something else and don’t mind assigning lower values implicitly.