Environmental and Logical Uncertainty: Reported Environmental Probabilities as Expected Environmental Probabilities under Logical Uncertainty

[Readers should be familiar with the Bayesian view of probability]

Let’s differentiate between environmental and logical uncertainty, and, consequently, environmental and logical probabilities. Environmental probabilities are the ones most of my readers will be closely familiar with. They are about the kinds of things that you can’t figure out even if you have infinite amounts of computing power until you have seen enough evidence.

Logical uncertainty is a different kind of uncertainty. For example, what is the 1,000th digit of the mathematical constant e? You know how e is defined. The definition uniquely implies what the 1,000th digit of e is and yet you’re uncertain as to the value of the 1,000th digit of e. Perhaps, you would assign logical probabilities: the probability that the 1,000th digit of e is 0 is 10%. For more detail, consider the MIRI paper Questions of Reasoning Under Logical Uncertainty by Nate Soares and Benja Fallenstein.

Now, I would like to draw attention to what happens when an agent is forced to quantify its environmental uncertainty, for example, when it needs to perform an expected value calculation. It’s good to think in terms of simplified artificial minds rather than humans, because human minds are so messy. If you think that any proper artificial intelligence would obviously know the values of its environmental probabilities, then think again: Proper ways of updating environmental probabilities based on new evidence (like Solomonoff induction) tend to be incomputable. So, an AI usually can’t quantify what exact values it should assign to certain environmental probabilities. This may remind you of the 1,000th digit of e: In both cases, there is a precise definition for something, but you can’t infer the exact numbers from that definition, because you and the AI are not intelligent enough.

Given that computing the exact probabilities is so difficult, the designers of an AI may fail with abandon and decide to implement some computable mechanism for approximating the probabilities. After all, “probabilities are subjective” anyway… Granted, an AI probably needs an efficient algorithm for quantifying its environmental uncertainty (or it needs to be able to come up with such a mechanism on its own). Sometimes you have to quickly compute the expected utility of a few actions, which requires numeric probabilities. However, any ambitious artificial intelligence should also keep in mind that there is a different, more accurate way of assigning these probabilities. Otherwise, it will forever and always be stuck with the programmers’ approximation.

The most elegant approach is to view the approximation of the correct environmental probabilities as a special case of logical induction (i.e. reasoning over logical uncertainty) possibly without even designing an algorithm for this specific task. On this view, we have logical meta-probability distributions over the correct environmental probabilities. Consider, for example, the probability P(T|E) that we assign to some physical theory T given our evidence E. There is some objectively correct subjective probability P(T|E) (assuming, for example, Solomonoff’s prior probability distribution), but the AI can’t calculate its exact value. It can, however, use logical induction to assign probabilities to statements like P(T|E) probabilities densities to statements like P(T|E)=0.368. These probabilities may be called logical meta-probabilities – they are logical probabilities about the correct environmental probabilities. With these meta-probabilities all our uncertainty is quantified again, which means we can perform  expected value calculations.

Let’s say we have to decide whether to taking action a. We know that if we take action a, one of the outcomes A, B and C will happen. The expected value of a is therefore

E[a] = P(A|a)*u(A) + P(B|a)*u(B) + P(C|a)*u(C),

where u(A), u(B) and u(C) denote the utilities an agent assigns to outcomes A, B and C, respectively. To find out the expected value of a given our lack of logical omniscience, we now calculate the “expected expected value”, where the outer expectation operator is a logical one:

E[E[a]] = E[P(A|a)*u(A) + P(B|a)*u(B) + P(C|a)*u(C)] = E[P(A|a)]*u(A) + E[P(B|a)]*u(B) + E[P(C|a)]*u(C).

The expected values E[P(A|a)], E[P(B|a)] and E[P(C|a)] are the expected environmental probabilities of the outcomes given a and can be computed using integrals. (In practice, these will have to be subject to approximation again. You can’t apply logical induction if you want to avoid an infinite regress.) These expected probabilities are the answers an agent/AI would give to questions like, “What probability do you assign to A happening?”

This view of reported environmental probabilities makes sense of a couple of intuitions that we have about environmental probabilities:

  • We don’t know which probabilities we assign to a given statement, even if we are convinced of Bayes’ theorem and a certain prior probability distribution.
  • We can update our environmental probability assignments without gathering new evidence. We can simply reconsider old evidence and compute a more accurate approximation of the proper Bayesian updating mechanism (e.g. via logical induction).
  • We can argue about probabilities with others and update. For example, people can bring new explanations of the data to our attention. This is not Bayesian evidence (ignoring that the source of such arguments may reveal its biases and beliefs through such argumentation). After all, we could have come up with these explanations ourselves. But these explanations can shift our logical meta-probabilities. (Formally: what external agents tell you can probably be viewed as (part of) a deductive process, see MIRI’s newest paper on logical induction.)

Introducing logical uncertainty into assigning environmental probabilities doesn’t solve the problem of assigning appropriate environmental probabilities. MIRI has described a logical induction algorithm, but it’s inefficient.

The Age of Em – summary of policy-relevant information

In this post I summarize the main (potentially) policy-relevant points that Robin Hanson makes in The Age of Em.

If you don’t have time to read the whole post, only read the section The three most important take-aways. My friend and colleague Ruairí also recommends to skip directly to the section on conflict and compromise if you already know the basics of Hanson’s em scenario.

You may check whether Hanson really makes the statements I ascribe to him by looking them up in the book. The page numbers all refer to the print edition, where the main text ends on page 384.


Many parts of the book are not overly interesting and not very policy-relevant. For example, Hanson dedicates a lot of space to discussing how em cities will have to be cooled. Some things are very interesting, because they are weird. For example, faster ems will have smaller bodies (some of them (if not most) have no bodies at all, though). And some things could be policy-relevant. Also, I learned a lot of interesting stuff on the go. E.g., what kind of hand gestures successful and unsuccessful people make. Or that employees are apparently happier and more productive if they just try to satisfy their bosses instead of trying to do good work. Hanson makes extensive use of valuable references to support such claims. In addition to the intrinsic importance of the content, the book serves as a great example in futurology without groundless speculation.

There is also a nice review by Slate Star Codex, which also gives an overview of some of the more basic ideas (section III). I find the whole Age of Em scenario a lot less weird than Scott does and also disagree with the “science fiction” criticism in section VI. Section V rips apart (successfully, in my opinion) the arguments that Hanson gives (in the book) to support the assumption that whole brain emulation will arrive before de novo AI.

What’s an em, anyway?

Hanson’s book argues that soon, human brains can be scanned and then run in a way that preserves their functionality. These scans are called mind uploads, whole brain emulations or ems. Given the advantages that these digital versions have over meat-humans (such as the possibility of speed up, copiability, etc.), these ems would quickly come to dominate the economy. Ems are similar to humans in many regards, but the fundamental differences of being digital have a variety of interesting consequences for an em-dominated world. And this is what Hanson’s book is about.

If you are not familiar with these ideas at all, consider for example Hanson’s TEDx talk or section III in the Slate Star Codex review.

The three most important take-aways

  • The elites of our world will dominate the em world. So, focusing on certain elites today is more important for the em scenario. Also, our memes should be tailored more to elites than what would be the case in a scenario without ems.
  • The transition to an em world could cause major upheavals in moral values. It’s conceivable that in some em scenarios, the world could end up much closer to my values (panpsychic, welfarist, more willing to see some lives as not worth living, etc.) than in non-em scenarios. However, ems could also be largely egoistic and not care about philosophy much.
  • AI safety will probably be easier to solve for ems, i.e. ems are more likely to create de novo AI that is aligned with their values.

Competition and Malthusian wages

Without substantial regulation, the em world will be a lot more competitive (see p. 156ff.).

“The main way that em labor markets differ from labor markets today is that ems can be easily copied. Copying causes many large changes to em labor markets. For example, with copying there can be sufficient competition for a particular pre-skill type given demand from many competing employers, and supply from at least two competing ems of that type. For these two ems, all we need is that when faced with a take it or leave it wage offer, they each accept a wage of twice the full hardware cost [if they want to work at most half of their time].” (p. 144)

So even if for each job there are just two ems who are willing to do the job at very low wages, wages would fall to near-subsistence level almost immediately.

Who is in power in the em world?

In the em scenario, an elite-focus for our movement is more important than it already is. The elites (in terms of intelligence, productivity, wealth etc.) of our world will completely dominate (by number!) the em world. Therefore, influencing them is strategically much more important for influencing the em world. The elites within the em world will also be more important, e.g. because the em world may be less democratic or have a more rigid class and power hierarchy. This also suggests that it may be a little more important than we thought that the memes of our movement should make sense to elites.

Who becomes an em?

Which humans will be chosen to become ems and be copied potentially billions of times?

  • Young people (that is, people who are young when the ems are created) are probably more important, because living in the em world will require many new skills that young people are more likely to be able to acquire. (p. 149)
  • Because ems can be copied, there is not really a need to have many different ems. One can basically just take the 1000 most able humans (or the most talented human in every relevant area) and produce many copies of them (see pp.161). Therefore, the em world will be completely dominated by the elites of the human world.
  • The first people who become ems will tend to be rich or supported by large companies or other financiers, because scanning will be expensive in the beginning. Also, the chance of success will be fairly low in the first years of whole brain emulation, so classic egoists may have inhibitions against uploading. (On the other hand, they may want to dominate the em world, or want to be scanned while they still have a chance to gain a foothold in the em world.) The very first ems may thus be over-proportionately crazy/desperate, altruists who want to influence the em era, terminally ill, and maybe cryonics customers who are legally dead (see p. 148). Because first movers have an advantage (p. 150), it seems especially promising for altruists to try to get scanned in the early days when chances of success are at rates like 20% (and the original human is destroyed in the process of scanning) which would discourage others from daring the step into the em world. Having some altruistic elite members is therefore more important for an altruistic movement in this scenario than having many not so committed or not sufficiently talented members.
  • “It is possible that the first ems will come predominantly from particular nations and cultures. If so, typical em values may tend to be close to the values of whatever nations provided most of the ordinary humans whose brains were scanned for these first ems.” (p. 322) This suggests that not only personal eliteness but also being a national of an elite country will become important. This is similar to space travel (and maybe other frontiers), e.g. NASA employs only US citizens. Off the cuff, the most important countries in this regard are then probably the US, China, Switzerland (because of the Blue Brain project), some EU countries (because of high GDP and recent ESA success) and Japan.
    • “The first em cities might plausibly form around big computer data centers, such as those built today by Google, Amazon, and Microsoft. Such centers likely have ample and cheap supporting resources such as energy, are relatively safe from storms and social disruptions, and are also close to initial em customers, suppliers, and collaborators in the richest parts of the industrial economy. These centers prefer access to cheap cold water and air for cooling, such as found toward Earth’s poles, and prefer to be in a nation that is either relatively free from regulations or that is small and controlled by friendly parties. These criteria suggest that the first em city arises in a low-regulation Nordic nation such as Norway.” (p. 360) Of course, such low-regulation countries in which em cities are built could nonetheless have little influence on the policies and values of the em world itself, e.g. an em city in Norway may consist of brains that were scanned in the USA.

More stability in an em world

Overall, the class hierarchy of the em era will probably be more rigid than in the human era.

  • “Ems with very different speeds or sizes might fit awkwardly into the same space, be that physical or virtual. Fast ems whizzing past could be disorienting to slower ems, and large ems may block the movement or view of small ems.” (p. 110) Some kind of segregation seems convenient: either areas of an em city have a certain standard speed or ems of the wrong speed class will be filtered out from what ems can see. “So there may be views that hide lower status ems, and only show higher status ems. This could be similar to how today servants such as waiters often try to seem invisible, and are often treated by those they serve as if invisible. The more possible views that are commonly used, the harder it will be for typical ems to know how things look from others’ typical points of view.” (p.111, also see p. 218)
  • There will probably be a few distinct speeds at which ems run as opposed to all kinds of em speeds being common because ems at the same speed can communicate well, whereas ems at speeds differing by a factor of 1.5 or more will probably have problems. (See pp. 222, 326)
  • Many of the ways in which more regulation is possible make it possible to prevent upheavals and oppress non-conformist positions.
  • Since ems can be copied after training, few ems will be in training. Instead, most ems will be at their peak productivity age, which for humans is, according to Hanson, usually between the ages 40 and 50—but could be much higher for ems given that their brains don’t deteriorate. (p.202ff.) So, ems may be somewhat older (in terms of subjective age) than humans. (See Wikipedia: List of countries by median age.)
  • Many aspects of aging can be stopped in ems. Therefore, ems may be able to work productively longer and hold on to their power much longer (p. 128f.). This means there will be fewer generation changes (per unit of subjective time). Since people tend to change their ways less often when they are old, the overall moral and political views of the em worlds might also be a lot more stable (judged by subjective em-time).
  • Em societies may be non-democratic (p. 259).
  • “Political violence, regime instability, and policy instability all seem to be negatively correlated with economic growth.” (p. 262) The stable em cities may come to dominate.
  • “As ems have near subsistence (although hardly miserable) income levels, and as wealth levels seem to cause cultural changes, we should expect em culture values to be more like those of poor nations today. As Eastern cultures grow faster today, and as they may be more common in denser areas, em values may be more likely to be like those of Eastern nations today. Together, these suggest that em culture […] values […] authority.” (p. 322f.)
  • “[Because] ems [will probably be] more farmer-like, they tend to envy less, and to more accept authority and hierarchy, including hereditary elites and ranking by gender, age, and class. They are more comfortable with […] material inequalities, and push less for sharing and redistribution. They are less bothered by violence and domination toward the historical targets of such conflicts […]. […] Leaders lead less by the appearance of consensus, and do less to give the appearance that everyone has an equal voice and is free to speak their minds. Fewer topics are open for discussion or negotiation. Farmer-like ems […] enforce more conformity and social rules, and care more for cleanliness and order.” (p. 327f.)

Conflict and compromise

It’s unclear whether cooperation and compromise will be more easy to achieve in an em world and whether there would be more or less risk of conflict and AI arms races.

  • CEV-like approaches to value-loading might be easier to implement (see the section on AI safety).
  • Because ems can travel more quickly, it will probably be easier for them to communicate more often with ems in other parts of the world (pp.75-77).
  • “Groups of ems meeting in virtual reality might find use for a social ‘undo’ feature, allowing them to, for example, erase unwelcome social gaffes. At least they could do this if they periodically archived copies of their minds and meeting setting, and limited the signals they sent to others outside their group. When the undo feature is invoked, it specifies a particular past archived moment to be revived. Some group members might be allowed a limited memory of the undo, such as by writing a short message to their new selves. When the undo feature is triggered, all group members are then erased (or retired) and replaced by copies from that past archive moment, each of whom receives the short message composed by its erased version.” (p. 104) I am not sure such a feature would be used in diplomacy, because being able to undo and retry makes signals of cooperativeness and honesty less credible. Of course, this could be addressed with the limited memory of the undo. If such a feature were used in diplomacy, it could make interaction across cultural difference more smooth.
  • “As the em era allows selection to more strongly emphasize the clans who are most successful at gaining power, we should expect positions of power in the em world to be dominated even more by people with habits and features conducive to gaining power.” (p.175) Such people tend to be more suspicious of potential work rivals (p. 176) and often refer to us-them concepts (p. 177). This should increase risks of conflict.
  • Fewer restrictions on international trade and immigration are economically more efficient (p. 179). To the extent that different em cities are indeed competing strongly, we would expect such  behaviors from em governments as well. Fewer restrictions in these regards might decrease differences between cultures.
  • For various reasons ems may overall be more rational, which increases the probability that they will be able to avoid “stupid” scenarios like escalating arms races. E.g. they could implement combinatorial auctions (see p. 184ff.) (humans can probably do so as well, though), have more trustworthy advice from their own copies (pp. 180, 315ff.), lie less (p. 205), can be better prepared for tasks (you only have to prepare one em and then can copy that em as often as you wish) (p. 208ff.).
  • Because shipping physical goods across the globe will take ages in fast ems’ subjective time (cargo ships probably can’t be sped up nearly as much as the thinking of ems, so cargo ships will seem extremely slow to fast ems) trade of such physical goods between em cities may hardly happen at all (p. 225).
  • Most ems will be at the peak productivity age, i.e. 40-50 or above (p. 202ff.). 50-year-olds tend to be less supportive of war than younger people (p. 250). Again see Wikipedia: List of countries by median age.
  • Poorer nations wage war more often and most ems will be poor (p. 250).
  • Having no children may make people more belligerent (p. 250).
  • The gender imbalance (more males than females) may increase the probability of war (p. 250).
  • If male ems are “castrated” (or, rather, something analogous to it) because of the gender imbalances and the obsoleteness of sexual reproduction, they will tend to be less aggressive and more sensitive, sympathetic and social. (p. 285)
  • Similar to family clans, the importance of copy clans may lead to less trust, fairness, rule of law and willingness to move or marry those from different cultures (p. 253).
  • Ems can have on call advisors, which can answer questions all the time. (pp. 315ff.) This could make diplomacy smoother, because the advisors are more likely to assume a long-term perspective (i.e. a far view), which, e.g., could make diplomats less driven by emotions like impatience, fear, anger etc.
  • “As ems have near subsistence (although hardly miserable) income levels, and as wealth levels seem to cause cultural changes, we should expect em culture values to be more like those of poor nations today. As Eastern cultures grow faster today, and as they may be more common in denser areas, em values might be similar to those of Eastern nations today. Together, these suggest that em culture […] values […] good and evil and local job protection.” (p. 322f.) This could increase the probability of conflicts.
  • There is a possibility of conflict between ems that come from our era and ems that grew up in the em era. “[T]he latter ems are likely to be better adapted to the em world, but the former will have locked in many first mover advantages to gain enviable social positions.” (p. 324) Similarly, there could be a conflict between humans and ems. (p. 324f., 361) In both cases, the newcomers may be very different due to competitiveness and thus could have a strong motivation to change the status quo.
  • “A larger total em population should [..] lead us to expect more cultural fragmentation. After all, if local groups differentiate their cultures to help members signal local loyalties, then the more people that are included within a region, the more total cultural variation we might expect to lie within that region. So a city containing billions or more ems could contain a great many diverse local cultural elements.” (p. 326) This suggests a higher probability of at least smaller conflicts.
  • “Poorer ems seem likely to return to conservative (farmer) cultural values, relative to liberal (forager) cultural values. […] Today, liberals tend to be more open-minded […]. If, relative to us, ems prefer farmer-like values to forager-like values, then ems more value things such as […] patriotism and less value […] tolerance […]. […] They are more comfortable with war […]. They are less bothered by violence and domination toward the historical targets of such conflicts, including foreigners […]. […] Conservative jobs today tend to focus on a fear of bad things, and protecting against them.” (p. 327f.)
  • “Today, ‘fast-moving’ action movies and games often feature a few key actors taking many actions with major consequences, but with very little time for thoughtful consideration of those actions. However, for ems this scenario mainly makes sense for rare isolated characters or for those whose minds are maximally fast. Other characters usually speed up their minds temporarily to think carefully about important actions.” (p. 332) In this way, even action movies could set norms for thoughtfulness, whereas nowadays they propagate a “shoot first, ask questions later” mentality.
  • As described in the section on AI safety, ems may have a much better understanding of decision theory, which makes compromise and the avoidance of defection in prisoner’s dilemma-like scenarios much easier.
  • “[M]ost ems might [..] be found in a few very large cities. Most ems might live in a handful of huge dense cities, or perhaps even just one gigantic city. If this happened, nations and cities would merge; there would be only a few huge nations that mattered.” (p. 216) This would make coordination a lot easier.


It is highly unclear to me what values ems will adopt.

  • Ems have no reasons to farm animals for food or use animals for testing of drugs. Cognitive dissonance theory suggests that this will make the majority care about animals more than they do today.
  • “As a cat brain has about 1% as many neurons as a human brain, virtual cat characters are an affordable if non-trivial expense. Most pet brains also require the equivalent of a small fraction of a human brain to emulate. The ability to pause a pet while not interacting with it would make pets even cheaper. Thus emulated animals tend to be cheap unless one wants many of them, very complex ones, or to have them run for long times while one isn’t attending to them. Birds might fly far above, animals creep in the distance, or crowds mill about over there, but one could not often afford to interact with many complex creatures who have long complex histories between your interactions with them.” (p. 105)
  • As opposed to most humans, em copies will mostly be created on demand. I.e. if you are an em, you apply to jobs (or employers offer them to you) and for every job that you get, you create a copy that fills this jobs. (In some unregulated dystopian scenarios it is also possible, of course, that ems can’t veto on whether they want to have a copy made of themselves.) This means that the question of “will this specific life be worth living?” will be more common among ems (indeed, more forced upon ems) than humans, who usually don’t know what the lives of their children will be like. They will also feel more responsible for having made the decision to live their current lives, so unless they decided to make a copy for ethical reasons, they are much less likely to be anti-natalist. After all, they decided themselves to be copied (see p. 120). Also, there is strong selection pressure favoring ems who consider, say, a life without much leisure to be still positive (see p. 123). Similarly, there are selection pressures towards ems wanting to make many copies of themselves.
  • There is a strong selection pressure against ems who are not willing to create short-lived (i.e., quickly deleted) copies of themselves. If competition will be strong enough (and human nature sufficiently flexible), ems will value that at least one of their copies will survive, but they likely would not disvalue the death of a single of their copies much. This could lead to values along the lines of “biodiversity applied to humans”, where copy clans count as the morally relevant entities, as opposed to individuals. This would be similar to how many people care about preserving certain species instead of the welfare of individuals. This would not only be bad for em welfare but also move the moral views of ems farther away from mine. On the other hand, hedonic well-being could fill the gap of death as the center for moral concern.
  • Hanson argues that ems probably won’t suffer much (p. 153, 371), because their virtual reality (and even their own brain) can be controlled so well. Given that experiencing suffering is probably correlated with caring about suffering, this might be bad in the long term.
  • Assuming that ems can be tweaked, they may be made especially thoughtful, friendly and so on.
  • Because of higher competition, ems work more (e.g. see pp. 167ff., 207) and are paid less. Therefore, they don’t have the resources for altruistic activities that modern elites have.
  • People who are more productive tend to be married, intelligent extroverted, conscientious and non-neurotic. Smarter people are more cooperative, patient, rational and law-abiding. They also tend to favor trading with foreigners more. So, because ems will be selected for productivity, they will tend to have these features as well. (p. 163)
    • It is somewhat unclear whether ems will be more or less religious. Apparently religious people are more productive, but they are also less innovative. (p. 276, 311) Hanson expects that religions will be able to adapt to the em world’s weirdnesses (p. 312).
  • Workaholics tend to be male and males are also more competitive, so the em world may well be dominated by males (p. 167), which are less compassionate and less likely to be vegan or vegetarian.
  • “While successful ems work hard and accept unpleasant working conditions, they are not much more likely to seriously resent or rail against these conditions than do hard-working billionaires or winners of Oscars or Olympic gold medals today. While such people often work very hard under grueling conditions, they usually accept such conditions as a price for their chance at extreme success.” (p. 169) So perhaps ems won’t take the suffering of less fortunate individuals very seriously.
  • “[O]lder people tend to associate happiness more with peacefulness, as opposed to excitement.” (p. 205) So, old people may be more focused on avoiding very bad experiences relative to bringing about very pleasurable ones.
  • Most ems don’t have children (p. 211f.), which could make them more compassionate towards others.
  • At some point, it may become attractive to scan children to turn them into ems, because they can better adapt to the em world (p. 212). This could give an advantage to ruthless countries and children of psychopathic parents, who are themselves more likely to be psychopathic.
  • Space will lose some appeal, because it takes ages of subjective time to get there (p. 225).
  • If male ems are “castrated” (however that would exactly work for ems) because of the gender imbalances and the obsoleteness of sexual reproduction, then they tend to be more sympathetic. (p. 285)
  • “Ems can travel more cheaply to virtual nature parks, and need have little fear that killing nature will somehow kill them.” (p. 303)
  • The classic targets of charity—alms, schools and hospitals—will all be a lot less necessary than today (p. 302). This may lead ems to support other kinds of charity.
  • “New em copies and their teams are typically created in response to new job opportunities. Such teams typically end or retire when these jobs are completed. Thus ems are likely to identify strongly with their particular jobs; their jobs are literally their reason for existing.” (p. 306, also see p. 328) Maybe this implies that ems will be less involved in pursuing ethical causes.
  • For ems it is obviously much more natural to be anti-substratist.
  • For ems, it is more natural to consider consciousness as coming in degrees. For example, em minds differ in speed, but there could also be partial minds (p. 341ff.).
  • “If ems are indeed more farmer-like, […] they are less bothered by violence and domination toward the historical targets of such conflicts, including foreigners, children, slaves, animals, and nature.” (p. 327)
  • Ems will care more about their copies than humans that have never been copied.

AI safety

Overall, ems seem more likely to get AI safety right. Arguments beyond Hanson’s are given in a talk by Anna Salomon (and Carl Shulman). Consider also a workshop report by Anna Salamon and Luke Muehlhauser on the topic.

  • Because ems tend to have many copies, decision and game theoretical ideas that are relevant for AI safety will be more common and practically tested in em society.
    • There is the possibility of mind theft, i.e. that someone steals a copy of an em to interrogate it. (p.60f.) So, ems may pre-commit against giving in to anything like torture to disincentivize mind theft (p. 63).
    • There may be “open source” ems (p. 61), which are free for everyone to copy. These must have pre-commitments against any kind of coercion to enforce a policy of only working for those which grant them a certain standard of living.
    • “An em might be fooled […] by misleading information about its copy history. If many copies were made of an em and then only a few selected according to some criteria, then knowing about such selection criteria is valuable information to those selected ems. For example, imagine that someone created 10,000 copies of an em, exposed each copy to different arguments in favor of committing some act of sabotage, and then allowed only the most persuaded copy to continue. This strategy might in effect persuade this em to commit the sabotage. However, if the em knew this fact about its copy history, that could convince this remaining copy to greatly reduce its willingness to commit to sabotage.” (p. 112, also see pp. 60, 120) Such weird processes could make ems a lot better at anthropic reasoning.
    • It will be easy to put ems into simulations to test their behavior in certain situations (p. 115ff.). So, Newcomb-like problems are a very practical problem in the em word. Ems also often interact with copies of themselves, which could sometimes be similar to a corresponding variant of the prisoner’s dilemma.
  • The possibility of mind theft (or in general the fact that ems live in the digital world) lead ems to increase spending in computer security (p. 61f.), which makes both AI control (e.g. via provably secure operating systems) and AI boxing easier. AI boxing is also made easier by fast ems being able to “directly monitor and react to an AI at a much higher time resolution.” (p. 369)
  • Whole brain emulation makes CEV-like approaches to AI safety easier.
    • “Mild mindreading might be used to allow ems to better intuit and share their reaction to a particular topic or person. For example, a group of ems might all try to think at the same time about a particular person, say ‘George.’ Then their brain states in the region of their minds associated with this thought might be weakly driven toward the average state of this group. In this way this group might come to intuitively feel how the group feels on average about George.” (p. 55)
    • Hanson believes that there may be “methods to usefully merge two em minds that had once split from a common ancestor, with the combined mind requiring not much more space and processing power than did each original mind, yet retaining most of the skills and memories of both originals.” (p. 358)
  • Messier AI designs may be feasible in an em world. Such designs might be less controllable, e.g. because the goal is less explicit.
    • On p. 50, Hanson writes that small-scale cognitive enhancements may be possible for ems. Some of them may allow ems to have much better memory, which allows them to work on less modular AI designs.
    • One can save a copy of a programmer who wrote a piece of software to later let them rewrite that piece of software. (p. 278) This avoids some of the typical problems of legacy systems and could lower quality standards.
    • Once serial computer speed hits a wall, em software needs to be very parallel to not appear sluggish to fast (highly parallelized) ems (p. 279). So, ems will become much better at writing parallel computer programs. This may lead to more messy approaches to AI (e.g., society of mind, many subagencies etc.), which are more difficult to control. However, there could probably be very systematic approaches to parallel computing as well. In that case, the parallel computing trend would not make a big difference.
    • Programmers can have many copies and run at very, very high speeds and then finish huge pieces of software on their own (p. 280f.). This could allow them to get away with idiosyncratic systems that would otherwise be impossible to implement for a human team. Again, this could lead to more messy approaches to AI.
  • De novo AIs could still be partly ems, for example they could be created by replacing subsystems of ems by more efficient programs while keeping the motivation system intact.
  • Because ems live longer (potentially forever), they have less motivation to create AI quickly.

Appendix A: Why regulation might be easier to enforce in an em world

Hanson largely assumes a scenario with low regulation, which makes sense—if only to be able to make predictions at all. However, there are also many reasons to believe that much stricter regulation could be enforced in the em world:

  • Mind reading might be possible to some extent (pp. 55ff.).
  • One can test an em’s loyalty by putting it into a simulation (pp. 115ff.).
  • Virtual reality makes surveillance much easier (pp. 124ff., 273).
  • If death of a single copy is considered to be no great harm (pp. 134ff.), you could quite easily shut off all copies of a criminal (except, maybe, one which you retire at very low speed.
  • The em world is probably dominated by a few hundred “copy clans”. This should make coordination a lot easier.
  • Most ems will probably live in just a few cities (p. 214ff.). This makes coordination easier.
  • Crimes can be discouraged more decisively by holding copy clans legally liable for the behaviors of members (p. 229).
  • Em firms will be larger than today’s firms (p. 231).
  • “[T]here is a possibility that ems may create stable totalitarian regimes that govern em nations.” (p. 259, also see pp. 264ff.)
  • “Archived copies of minds from just before a key event could be used to infer the intent and state of knowledge of ems at that key event.” (p. 271) This makes jurisdiction easier.
  • It seems plausible that after the first em is created, the technology will be in the hands of one country or coalition for a while. (On the other hand, the USA caught up within months after Sputnik.) This will make it easy to set up an em world in a way that conforms with the agenda of this coalition. Assuming that the creation of ems will yield a transition as significant as Hanson makes it out to be, the first coalition might already have a decisive strategic advantage. Of course, this just means that there will be an arms race towards whole-brain emulation instead of one towards de novo AI, but the former wouldn’t have many of the negative consequences of the latter, because ems can’t be uncontrolled.

Appendix B: Robin Hanson’s moral values

What’s interesting about the book is that, while the scenario Hanson outlines would be considered dystopian by many, Hanson seems to consider it an acceptable outcome. (Consider his “Evaluation” section (pp. 367ff.).) Some striking examples are the following statements:

  • “Of course, lives of quiet desperation can still be worth living.” (p. 43)
  • “[A] disaster so big that civilization is destroyed and can never rise again […] harms not only everyone living in the world at at the time, but also everyone who might have lived afterward, until either a similar disaster later, or the end of the universe.” (p. 369, emphasis added)

Lexicographic utility functions

Intuitions about there being extreme kinds of suffering that cannot be outweighed by any amount of happiness and that are more important than any amount of mild suffering violate the continuity axiom* of the Von Neumann-Morgenstern (vNM) utility theorem. Does that mean that holding extreme suffering to be impossible to outweigh (as, for example, threshold negative utilitarians do) makes it impossible to represent your preferences with a utility function? Can you not maximize expected utility? Is it irrational to hold such preferences?

It turns out, there’s a theorem which basically says that such preferences can still be represented by a utility function, it just has to be taken from a broader space. The function does not necessarily map to real numbers, but to some larger set of possible utilities. Specifically, without continuity there is still always a utility function that maps outcomes onto members of a lexicographically ordered real-valued vector space and that accurately represents the given preferences. A very good and (to the mathematically literate) fairly accessible exposition is given by Blume, Brandenburger and Dekel (1989), ch. 1 and 2.

Complications arise when the space of possible outcomes (the things that utility is assigned to) is infinite, which would allow for an infinite number of thresholds – an infinite hierarchy of outcomes, each of which is infinitely better than the lower ones. This can’t be captured with a finite-dimensional, lexicographically ordered, real-valued vector space, anymore. However, in this case, one can map lotteries into an infinite-dimensional space with a lexicographic ordering. Alternatively, one can add an axiom which limits the number of “levels” to a finite number n and then an n-dimensional real-valued vector space suffices again. The latter is done by Fishburn (1971) in A Study of Lexicographic Expected Utility, which is pay-walled and not as readable.

It would be good if those interested in suffering-focused ethics knew that continuity in the vNM axioms is not really an argument against thresholds. (In general, continuity seems less compelling than completeness and transitivity.) Saying that holding extreme suffering to be impossible to outweigh is irrational because it violates the vNM “rationality axioms” is an objection that I would expect to be raised, and it would be good if proponents of such a view could easily refer to some place for a clarification without spending too much time on this red herring. Personally, I don’t think I’d defend this view myself, but despite moral anti-realism “what can be destroyed by the truth, should be”, even in the field of ethics.

*E.g., if N and M are mild amounts of happiness/suffering such that M

Edit: Simon Knutsson made me aware of some discussion of the continuity axiom in the philosophical literature:

  • Wolf, C. (1997): Person-Affecting Utilitarianism and Population Policy or, Sissy Jupe’s Theory of Social Choice. In J. Heller and N. Fotion (Eds.), Contingent Future Persons.
  • Arrhenius, G., & Rabinowicz, W. (2005): Value and Unacceptable Risk. Economics and Philosophy, 21(2), 177–197

  • Danielsson, S. (2004): Temkin, Archimedes and the transitivity of ‘Better’. Patterns of Value: Essays on Formal Axiology and Value Analysis, 2, 175–179.

  • Klint Jensen, K. (2012): Unacceptable risks and the continuity axiom. Economics and Philosophy, 28(1), 31–42.

  • Temkin, L. (2001): Worries about continuity, transitivity, expected utility theory, and practical reasoning. In D. Egonsson, J. Josefsson, B. Petersson, & T. Rønnow-Rasmusen (Eds.), Exploring Practical Philosophy (pp. 95–108).

  • Note 7 in Hájek, A. (2012): Pascal’s Wager. In: Edward N. Zalta (ed.), The Stanford Encyclopedia of Philosophy (Winter 2012 Edition).

Suicide as wireheading

In my last post you learned about wireheading. In this post, I’ll bring to attention a specific example of wireheading: suicide or, as one would call it for robots, self-destruction. What differentiates self-destruction from more commonly discussed forms of wireheading is that it does not lead to a pleasurable or very positive internal state, but it too is a measure that does not solve problems in the external world as much as it changes one’s state of mind (into nothingness).

Let’s consider the reinforcement learner who has an internal module which generates rewards between -1 and 1. Now, there will probably be situations in which the reinforcement learner has to expect to mainly receive negative rewards in the future. Assuming that zero utility is assigned to nonexistence (similar to how humans think of the states prior to their existence or phases of sleep as neutral to their hedonic well-being) a reinforcement learner may well want to end its existence to increase its utility. It should be noted that typical reinforcement learners as studied in the AI literature don’t have a concept of death. However, a reinforcement learner that is meant to work in the real world would have to think about its death in some way (which may be completely confused by default). An example view is one in which the “reinforcement learner” actually has a utility function that calculates utility from the sum of the outputs of the physical reward module. In this case, suicide would reduce the number of outputs of the reward module which can increase expected utility if rewards are expected to be more often or to a stronger extent negative in the future. So, we have another example of potentially rational wireheading.

However, for many agents self-destruction is irrational. For example, if my goal is to reduce suffering then I may feel bad once I learn about instances of extreme suffering and about how much suffering there is in the world. Killing myself ends my bad feeling, but prevents me from achieving my goal of reducing suffering. Therefore, it’s irrational given my goals.

The actual motivations of real-world people committing suicide seem a lot more complicated most of the time. Many instances of suicide do seem to be about bad mental states. Also, many attempt to use their death to achieve goals in the outside world as well, the most prominent example being seppuku (or harakiri), in which suicide is performed to maximize one’s reputation or honor.

As a final note, looking at suicide through the lens of wireheading provides one way of explaining why so many beings who live very bad lives don’t commit suicide. If an animal’s goals are things like survival, health, mating etc. that correlate with reproductive success, the animal can’t achieve its goals by suicide. Even if the animal expects its life to continue in an extremely bad and unsuccessful way with near certainty, it behaves rationally if it continues to try to survive and reproduce rather than alleviate its own suffering. Avoiding pain is only one of the goals of the animal and if intelligent, rational agents are to be prevented from committing suicide in a Darwinian environment, reducing their own pain better not be the dominating consideration. In sum, the fact that an agent does not commit suicide tells you little about the state of its well-being if it has goals about the outside world, which we should expect to be the case for most evolved beings.


Some of my readers may have heard of the concept of wireheading:

Wireheading is the artificial stimulation of the brain to experience pleasure, usually through the direct stimulation of an individual’s brain’s reward or pleasure center with electrical current. It can also be used in a more expanded sense, to refer to any kind of method that produces a form of counterfeit utility by directly maximizing a good feeling, but that fails to realize what we value.

From my experience, people are confused about what exactly wireheading is and whether it is rational to pursue or not, so before I discuss some potentially new thoughts on wireheading in the next post, I’ll elaborate on that definition a bit and give a few examples.

Let’s say your only goal is to be adored by as many people as possible for being a superhero. Then thinking that you are such a superhero would probably be the thing that makes you happy. So, you would probably be happy while playing a superhero video game that is so immersive that while playing you actually believe that you are a superhero and forget dim reality for a while. So, if you just wanted to be happy or feel like a superhero you would play this video game a lot given that it is so difficult to become a superhero in real life. But this isn’t what you want! You don’t want to believe that you’re a superhero. You want to be a superhero. Playing the video game does not help you to attain that goal, instead push-ups and spinach (or, perhaps, learning about philosophy, game theory and theoretical computer science) help you to be a superhero.

So, if you want to become a superhero, fooling yourself into believing that you are a superhero obviously does not help you. It even distracts you. In this example, playing the video game was an example of wireheading (in the general sense) that didn’t even require you to open your skull. You just had to stimulate your sensors with the video game and not resist the immersive video game experience. The goal of being a superhero is an example of a goal that refers to the poutside world. It is a goal that cannot be achieved by changing your state of mind or your beliefs or the amount of dopamine in your brain.

So, the first thing you need to know about wireheading is that if your goals are about the outside world, you need to be irrational or extremely confused or in a very weird position (where you are paid to wirehead, for example) to do it. Let me repeat (leaving out the caveats): If your utility function assigns values to states of the world, you don’t wirehead!

What may be confusing about wireheading is that for some subset of goals (or utility functions), wireheading actually is a rational strategy. Let’s say your goal is to feel (and not necessarily be) important like a superhero. Or to not feel bad about the suffering of others (like the millions of fish which seem to die a painful death from suffocation right now). Or maybe your goal is actually to maximize the amount of dopamine in your brain. For such agents, manipulating their brain directly and instilling false beliefs in themselves can be a rational strategy! It may look crazy from the outside, but according to their (potentially weird) utility functions, they are winning.

There is a special case of agents whose goals refer to their own internals, which is often studied in AI: reinforcement learners. These agents basically have some reward signal which they aim to maximize as their one and only goal. The reward signal may come from a module in their code which has access to the sensors. Of course, AI programmers usually don’t care about the size of the AI’s internal reward numbers but instead use the reward module of the AI as a proxy for some goals the designer wants to be achieved (world peace, the increased happiness of the AI’s users, increased revenue for HighDepthIntellect Inc. …). However, the reinforcement learning AI does not care about these external goals – it does not even necessarily know about them, although that wouldn’t make a difference. Given that the reinforcement learner’s goal is about its internal state, it would try to manipulate its internal state towards higher rewards if it gets the chance no matter whether this correlates with what the designers originally wanted. One way to do this would be to reprogram its reward module, but assuming that the reward module is not infallible, a reward-based agent could also feed its sensors with information that leads to high rewards even without achieving the goals that the AI was built for. Again, this is completely rational behavior. It achieves the goal of increasing rewards.

So, one reason for confusion about wireheading is that there actually are goal systems under which wireheading is a rational strategy. Whether wireheading is rational depends mainly on your goals and given that goals are different from facts the question of whether wireheading is good or bad is not purely a question of facts.

What makes this extra-confusing is that the goals of humans are a mix between preferences regarding their own mental states and preferences about the external world. For example, I have both a preference for not being in pain but also a preference against most things that are causing the pain. People enjoy fun activities (sex, taking drugs, listening to music etc.) for how it feels to be involved in them, but they also have a preference for a more just world with less suffering. The question “Do you really want to know?” is asked frequently and it’s often unclear what the answer is. If all of your goals were about the outside world and not your state of mind, you would (usually) answer such questions affirmatively – knowledge can’t hurt you, especially because on average any piece of evidence can’t “make things worse” than you expected things to be before receiving that piece of evidence. Sometimes, people are even confused about why exactly they engage in certain activities and specifically about whether it is about fulfilling some preference in the outside world or changing one’s state of mind. For example, most who donate to charity think that they do it to help kids in Africa, but many also want the warm feelings from having made such a donation. And often, both are relevant. For example, I want to prevent suffering, but I also have a preference for not thinking about specific instances of suffering in a non-abstract way. (This is partly instrumental, though: learning about a particularly horrific example of suffering often makes me a lot less productive for hours. Gosh, so many preferences…)

There is another thing which can make this even more confusing. Depending on my ethical system I may value people’s actual preference fulfillment or the quality of their subjective states (the former is called preference utilitarianism and the latter hedonistic utilitarianism). Of course, you can also value completely different things like the existence of art, but I think it’s fair to say that most (altruistic) humans value at least one of the two to a large extent. For a detailed discussion of the two, consider this essay by Brian Tomasik, but let’s take a look at an example to see how they differ and what the main arguments are. Let’s say, your friend Mary writes a diary, which contains information that is of value to you (be it for entertainment or something else). However, Mary, like many who write a diary, does not want others to read the content of her diary. She’s also embarrassed about the particular piece of information that you are interested in. Some day you get the chance to read in her diary without her knowing. (We assume that you know with certainty that Mary is not going to learn about your betrayal and that overall the action has no consequences other than fulfilling your own preferences.) Now, is it morally reprehensible for you to do so? A preference utilitarian would argue that it is, because you decrease Mary’s utility function. Her goal of not having anyone know the content of her diary is not achieved. A hedonistic utilitarian would argue that her mental state is not changed by your action and so she is not harmed. The quality of her life is not affected by your decision.

This divide in moral views directly applies to another question of wireheading: Should you assist others in wireheading or even actively wirehead other agents? If you are a hedonistic utilitarian you should, if you are a preference utilitarian you shouldn’t (unless the subject’s preferences are mainly about her own state of mind). So, again, whether wireheading is a good or a bad thing to do is determined by your values and not (only) by facts.

Self-improvement races

Most of my readers are probably familiar with the problem of AI safety: If humans could create super-human level artificial intelligence the task of programming it in such a way that it behaves as intended is non-trivial. There is a risk that the AI will act in unexpected ways and given its super-human intelligence, it would then be hard to stop.

I assume fewer are familiar with the problem of AI arms races. (If you are, you may well skip this paragraph.) Imagine two opposing countries which are trying to build a super-human AI to reap the many benefits and potentially attain a decisive strategic advantage, perhaps taking control of the future immediately. (It is unclear whether this latter aspiration is realistic, but it seems plausible enough to significantly influence decision making.) This creates a strong motivation for the two countries to develop AI as fast as possible. This is the case especially if the countries would dislike a future controlled by the other. For example, North Americans may fear a future controlled by China. In such cases, countries would want to invest most available resources into creating AI first with less concern for whether it is safe. After all, letting the opponent win may be similarly bad as having an AI with entirely random goals. It turns out that under certain conditions both countries would invest close to no resources into AI safety and all resources into AI capability research (at least, that’s the Nash equilibrium), thus leading to an unintended outcome with near certainty. If countries are sufficiently rational, they might be able to cooperate to mitigate risks of creating uncontrolled AI. This seems especially plausible given that the values of most humans are actually very similar to each other relative to how alien the goals of a random AI would probably be. However, given that arms races have frequently occurred in the past, a race toward human-level AI remains a serious worry.

Handing power over to AIs holds both economic promise and a risk of misalignment. Similar problems actually haunt humans and human organizations. Say, a charity hires a new director who has been successful in other organizations. Then this creates the opportunity for the charity to rise in influence. However, it is also possible that the charity changes in a way that the people currently or formerly in charge wouldn’t approve of. Interestingly, the situation is similar for AIs which create other AIs or self-improve themselves. Learning and self-improvement are the paths to success. However, self-improvements carry the risk of affecting the goal-directed behavior of the system.

The existence of this risk seems true prima facie: It should be strictly easier to find self-improvements that “probably” work than it is to identify self-improvements that are guaranteed to work. The former is a superset of the latter. So, AIs which are willing to take risks while self-improving can improve faster.

There are also formal justifications for the difficulty of proving self-improvements to be correct. Specifically, Rice’s theorem states that for any non-trivial property p, there is no way of deciding for all programs whether they have this property p. (If you know about the undecidability of the halting problem, Rice’s theorem follows almost immediately from that.) As a special case, deciding for all programs whether they are pursuing some goals is impossible. Of course, this does not mean that proving self-improvements to be correct is impossible. After all, an AI could just limit itself to the self-improvements that it can prove correct (see this discussion between Eliezer Yudkowsky and Mark Waser). However, without this limitation – e.g., if it can merely test some self-improvement empirically and implement it if it seems to work – an AI can use a broader range of possible self-modifications and thus improve more quickly. (In general, testing a program also appears to be a lot easier than formally verifying it, but that’s a different story.) Another relevant problem from (provability) logic may be Löb’s theorem which roughly states that a logical system with Peano arithmetic can’t prove another logical mechanism with that power to be correct.

Lastly, consider Stephen Wolfram’s more fuzzy concept of computational irreducibility. It basically states that as soon as a system can produce arbitrarily complex behavior (i.e., as soon as it is universal in some sense), predicting how most aspects of the system will behave becomes fundamentally hard. Specifically, he argues that for most (especially for complex and universal) systems, there is no way to find out how they behave other than running them.

So, self-improvement can give AIs advantages and ultimately the upper hand in a conflict, but if done too hastily, it can also lead to goal drift. Now, consider the situation in which multiple AIs compete in a head-to-head race. Based on the above considerations this case becomes very similar to the AI arms races between groups of humans. Every single AI has incentives to take risks to increase its probability of winning, but overall this can lead to unintended outcomes with near certainty. There are reasons to assume that this self-improvement race dynamic will be more of a problem for AIs than it is for human factions. The goals of different AIs could diverge much more strongly than the goals of different humans. Whereas human factions may prefer the enemy’s win over a takeover by an uncontrolled AI, an AI with human values confronting an AI with strange values has less to lose from risky self-modifications. (There are some counter-considerations as well. For instance, AIs may be better at communicating and negotiating compromises.)

Thus, a self-improvement race between AIs seems to share the bad aspects of AI arms races between countries. This has a few implications:

  • Finding out (whether there is) a way for AIs to cooperate and prevent self-improvement races and other uncooperative outcomes becomes more important.
  • Usually, one argument for creating AI and colonizing space is that Earth-originating aligned AI could prevent other, less compassionate AIs (uncontrolled or created by uncompassionate ETs) from colonizing space. So, according to this argument, even if you don’t value what humans or human-controlled AIs would do in space, you should still choose it as the lesser of two (or more) evils. However, the problem of self-improvement races puts this argument into question.
  • On a similar note, making the universe more crowded with AIs, especially ones with weird (evolutionary uncommon) values or ones that are not able to cooperate, may be harmful as it could lead to results that are bad for everyone (except for the AI which is created in a self-modification gone wrong).

Mathematical versus moral truth

This piece was inspired by a discussion between Magnus Vinding and Brian Tomasik, as well as a private discussion between Magnus Vinding and myself.

Some philosophers say that moral claims like killing is bad can be proven true or false. This position is called moral realism and according to a recent PhilPapers survey on the views of philosophers, it is a very common view among philosophers. It’s important to understand that moral realists not only claim truths in terms of some underlying assumptions! They don’t just say “If our goal is to reduce suffering, then torturing squirrels is objectively bad.” (In Kant’s terminology this would be called a hypothetical imperative by the way.) They claim that the goal of reducing suffering itself or some other goal can be true or false by itself.

One particular comparison that pops up in discussions of moral realism is that between morality and mathematics. After all, there are absolute (“a priori”) truths without assumptions in mathematics, right? Well…

Truth in mathematics

Until the beginning of the 20th century, most mathematicians probably would have argued that mathematics finds absolute truth without making assumptions. (Interestingly, Greek mathematician Euclid had axiomatized his geometry some two thousand years ago. However, the axioms were seen as “undoubtably/obviously true” without much further thought. Nevertheless, the idea of axiomatization, i.e. basing mathematics on assumptions, was not completely unknown.) But then mathematics had a foundational crisis mainly about problems like Russel’s paradox, which showed that you can easily run into some serious problems, if the foundations of mathematics aren’t studied in a rigorous manor and judgment of correctness of inference is based solely on “intuitive obviousness”. (An extremely accessible and overall great introduction to the topic is the graphic novel Logicomix.) To get rid of contradictions in mathematics, people tried to systematize its foundations. It turned out that the most solid way to go was explicit axiomatization: assuming a small set of statements (called axioms) that all the rest of mathematics can be deduced from. Luckily, a very small set of pretty basic assumptions called Zermelo-Fraenkel set theory is enough for deducing all of mathematics. And axiomatization is still the standard way to talk about truth in mathematical logic. (There are, however, different philosophical interpretations, some of them saying that the axioms are just undoubtedly true.)

Note that when some axiom system implies 2+2=4, this very statement of axioms => 2+2=4 is still not true in an absolute sense as it requires definitions of what all the symbols in that statement mean. And these definitions can only be made when using other symbols, for example those of natural language. (And the higher level proof of axioms => 2+2=4 can only be made by assuming some properties about these meta-symbols.) At some meta-…-meta-level we have to stop with a set of statements/definitions that just have to be assumed true and understood by everyone, or we have an infinite regress. (Humans usually stop at the language level, which is somehow obvious to everyone.) Or we get cycles (“a set is a collection” and “a collection is a set”). That’s the Münchhausen trilemma.

Maybe more importantly, axioms => 2+2=4 is not really the statement that we want to be absolutely true. We want 2+2=4 without underlying assumptions.

So, how did people do mathematics before axioms, then? How did they manage to avoid finding underlying axioms? Well, they always assumed some things to be obvious and did not require further justification. And these things were not even on the level of axioms, yet, but more like commutativity (n+m=m+n, e.g. 1+2=2+1). So, when proving some mathematical hypothesis, they wrote things like n+m=K and therefore also m+n=K and people didn’t ask why, because commutativity of addition is obvious. This is still how it is today: People know that if you continue to ask why they will end up at something like the Zermelo-Fraenkl axioms. And even if they don’t view the Zermelo-Frankl axioms as the foundation of mathematics, almost everyone agrees about commutativity of addition and basic arithmetics in general, whether generated by the Zermelo-Fraenkl or the Peano axioms.

So, this sounds terrible. How do we know that 2+2=4 if all of this is based on axioms that can’t be proven? Well, for one, mathematics is mainly relevant as a model of reality, which can be used to make testable predictions: if we take two separable things and add another two separable things then we will have four things. (That does not work for clouds, you see.) If we had an axiom system in which the natural definition of addition makes 2+2=5 true, then this addition would not be useful for predicting the number of things you have in the end. So, that’s one way to justify the use of Zermelo-Fraenkl: as a gold mine of simple models that can be used to make correct predictions.

There’s another silver lining. The sets of axioms that are usually used are extremely simple and basic. For example, one of the Zermelo-Fraenkl axioms states that if you have a couple of sets, then there is another set which contains all the elements of the sets. While it is philosophically troubling that proofs can’t verify “absolute truths” without some assumptions, it’s obviously not very productive to ask: “But what if the union of two sets really does not exist?” Just like it is not very productive to put into question the rules of logic and probability theory. Without assuming such “rules of thought”, it is not even clear how to think about doubting them! (Thanks to Magnus Vinding for pointing that out to me.) So, some things must be assumed if you don’t want your brain to explode.

Also, some of the axioms can be seen as so basic that they merely define the notion of sets. So, you may view the axioms more as definitions of what we are talking about than as some assumptions about truth.

So, some set of assumptions (or at least a common understanding of symbols) is simply necessary for doing any useful mathematics, or any useful formal reasoning for that matter. (Similarly, you need probability theory and some prior assumptions to extract knowledge from experience. And according to the no free lunch theorems of induction, if you choose the “trivial prior” of assuming all possible models to have the same probability you won’t be able to learn from experience either. So, you need something like Occam’s razor, but that’s another story…)

There is not much wrong with calling mathematical theorems “true” instead of “true assuming the axioms of … with the underlying logical apparatus …”, because the meaning does not vary greatly among different people. Most people agree about assuming these axioms or some other set of axioms that is able to produce the same laws of arithmetic. And there is also a level on which everybody can understand the axioms in the same way, at least functionally.

(Interestingly, there are exceptions to universal agreement on axioms in mathematics. The continuum hypothesis and its negation are both consistent with the Zermelo-Fraenkl axioms (assuming they themselves are consistent). This has led to some debate about whether one should treat the continuum hypothesis as true or false.)

Truth in ethics

So, let’s move to ethics and how it can be justified. The first thing to notice, I think, is that making correct claims in ethics requires additional assumptions or at least terminology. Deducing claims about what “ought to be” or what is “good” or “bad” requires new assumptions or definitions about this “oughtness”. It’s not possible to just introduce a new symbol or term and prove statements about it without knowing something (like a definition) about it. (This is probably what people mostly talk about when they say that moral realism is false: Hume’s is-ought-gap which has never been argued against successfully.)

But one needs to be able to make normative statements, i.e. statements about goals. Acting without a goal is like trying to judge the validity of a string of symbols without knowing the meaning of the symbols (or like making predictions without prior assumptions): not very sensible. Without goals, there is no reason to prefer one action over another. So, it does make sense to introduce some assumptions about goals into our set of “not-really-doubtable assumptions”.

Here are a few options that people commonly choose:

  • using the moral intuition black box of your brain;
  • using a set of rules on a meta-level, like “prefer simple ethical systems”, “ethical systems should be in the interest of everyone”, “ethical imperatives should not refer in any way to me, to any ethnic group or species” etc.;
  • using a lot of different, non-coherent (“object-level”) rules, like the ten commandments, the sharia, the golden rule, egoism etc., and deciding on specific questions by some messy majority vote system;
  • using some specific ethical imperative like preference utilitarianism, the ten commandments + egoism, coherent extrapolated volition etc.

(Note that this classification is non-rigorous. For example, there is not necessarily a distinction between rules and meta-rules. Coherent extrapolated volition can be used in the imperative “fulfill the coherent extrapolated volition of humankind on September 26, 1983” or in the meta-level rule “use the moral system that is the result of applying coherent extrapolated volition to the version of humankind on September 26, 1983”.  Also, most people will probably keep at least some of their intuition black box and not override it entirely with something more transparent. Maybe, some religious people do. Eliezer Yudkowsky would argue that affective death spirals can also lead non-religious people to let an ethical system override their intuition.)

Diverging assumptions

What strikes me as the main difference between assuming things in the foundations of mathematics and assuming some foundations of (meta-)ethics, is that there are many quite different sets of assumptions about oughtness and all of them don’t make your brain explode. It’s extremely helpful to have mathematical theories of arithmetic (or at least geometry) to produce useful/powerful models. And as soon as you have this basic level of power, you can pretty much do anything, no matter whether you reached arithmetic directly or via Zermelo-Fraenkl. Without the power of arithmetic, geometry or something like that, an axiomatic system goes extinct, at least when it comes to “practical” use in modeling reality. (Similar things apply to choosing prior probability distributions. In learning from observations, you can, in principle, use arbitrary priors. For instance, you could assume all hypotheses about the data that start with a “fu” to be very unlikely a priori. But there is strong selection pressure against crazy priors. Therefore, evolved beings will tend to find them unattractive. It’s difficult to imagine how the laws of probability theory can be false…)

For ethical systems the situation is somewhat different. Most people base their moral judgments on their intuition. However, moral intuition varies from birth and continues to change through external influences. There does not seem to exist agreement on meta-level rules, either. For example, many people prefer simple ethical systems, while others put normative weight on complexity of human value. People disagree about object-level ethical systems: there are different consequentialist systems, deontology and virtue ethics. And there are plenty of variations of each of these systems, too. And, of course, people disagree a lot on specific problems: the moral (ir)relevance of non-human animals, the (il)legitimacy of abortion, whether punishment of wrongdoers has intrinsic value, etc. So, people don’t really disagree on the fundamentals of mathematics, but they fundamentally disagree on ethics.

And that should not surprise us. Humans have evolved to be able to work towards goals and to learn about abstract concepts. Arithmetics is an abstract concept that is useful for attaining the goals that humans tend to have. And thus, in a human-dominated society, the meme of arithmetics (and axiomatic systems that are sufficient for arithmetic) will spread.

I can’t identify similar reasons for one ethical system to be more popular than all others. Humans should have evolved to be “selfish” (in the sense of doing things that were helpful for spreading genes in our ancestral environment) and they are to some extent (though, in modern societies, humans aren’t very good at spreading genes anymore). But selfishness is not a meme that is likely to go viral: there are few reasons for a selfish person to tell someone else (other than maybe their close kin) that they should be selfish. (Some people do it anyway, but that may be about signaling…) So, one should expect that meme to not spread very much. Including relatives and friends into moral consideration is a meme much more virulent than pure egoism. People tend to raise their children that way to stop them from exploiting their parents and from competing too hard with their brothers and sisters. Generosity and mutual cooperation among friends is also an idea that makes sense to share with your friends. Life in larger society makes it important to cooperate with strangers especially if they pay taxes to the same government that you do.

But as opposed to these somewhat ethical notions that can be selfish to spread, most people tend to think of ethics as giving without expecting something in return, which is not favored by evolution directly. So, while we should expect the notions of egoism, and cooperation with kin, friends and even members of one’s society (potentially at the risk of being a “sucker” sometimes) to be universal, there is much less direct selection pressure towards, say, helping animals. Explaining “real altruism” (not cooperation, signaling etc.) with evolutionary psychology is beyond this post, but I suspect that without strong selection pressures in any specific direction, there is not much reason to assume that there is much homogeneity. (The continuum hypothesis can be seen as a precedent for that in the realm of mathematics.)

Diverging assumptions make moral “truth” relative in a practical sense. A moral claim can be true on the set of assumptions held by many people and wrong on another set of assumptions also held by many people. So, instead of saying “true” one should at least say “true assuming hedonistic utilitarianism” or “true assuming that simple ethical systems are to be preferred, ethics should be universalizable and so on”. (Unless, of course, you are talking within a group with the same assumptions…)

Lack of precision

Another difference between ethics and logic is that ethical statements are much less precise than mathematical ones. Even before the foundations of mathematics were laid in the form of axiom systems and logical rules of deduction, there was little disagreement on what mathematical statements meant. That’s not the case for ethics. Two people can agree on killing is wrong and disagree on meat consumption, the death penalty and abortion, anyway, because they mean “killing” in different ways. People can agree on ethics being about all sentient beings and have the same knowledge about pigs and still disagree on whether they are ethically relevant, because they don’t agree on what sentience is.

Lack of precision makes moral truth ill-defined for most people. If your goal is to reduce suffering and you don’t have a precise definition of what suffering is, then you can’t actually prove that one set of consequences contains less suffering than another even if it is intuitively clear. If you have a plain intuition box, things are even worse. And thus one problem in normative ethics seems to be that few people are able to say what kind of argument would convince them to change their mind. (In the empirical sciences, statements are frequently blurry, too, of course, but usually much less so – scientists usually are able to make falsifiable claims.)


With moral “truth” being relative and usually ill-defined, I don’t think the term “truth” is very helpful or appropriate anymore, unless you add assumptions (as a remedy to relativity) that are at least somewhat precise (to introduce well-definedness).

As a kind of summary, here are the main ways in which moral realists could disagree with this critique:

  1. “Truth in mathematics (and the principles of empirical science, e.g. Occam’s razor, Bayesian probability theory) is absolute. At some level there is a statement that requires neither further definitions nor other assumptions. This makes it possible to also prove things like 2+2=4 without underlying assumptions, where 2,+, = and 4 can be translated into other things that don’t require further definition.”
  2. “The trilemma arguments about mathematical truth don’t apply to moral truth. While mathematical truth requires some underlying assumptions, moral truth does not require underlying assumptions. Or at least no further assumptions about ‘oughtness’.”
  3. “There is some core of assumptions about ethics that everybody actually shares in one variation or another similar to the way people share arithmetics and these can also be turned into precise statements (to everyone’s agreement) that make moral truth well-defined. With derivations from these underlying assumptions, most people could be convinced of, for example, some specific view on animal consciousness.”
  4. “Even though people disagree about the assumptions and/or their conceptions of ethics are non-rigorous and therefore nontransparent, speaking about ‘truth’ (without stating underlying assumptions) can somehow be justified.”

As I have written elsewhere, I am not entirely pessimistic about reaching a moral consensus of the sort that could warrant calling moral claims true in the sense that objection 3 proposes. I think there are some moral rules and meta-rules that pretty much everybody agrees on: the golden rule, morality being about treating others, etc. And many of them seem to be not completely blurry. At least, they are precise enough to warrant the judgment that torturing humans is, other things being equal, bad. Works like that of Mayank and Daswani and myself show that concepts related to morality can be formulated rigorously. And Cox’s theorem, as discussed and proved in the first chapters of E.T. Janes’ book on probability theory, even derives the axioms of probability from qualitative desiderata. Maybe, though I am not sure about this, there is a set of axioms that many people could agree on and that is sufficient for deriving a moral system in a formal fashion.

There are some models of this in the realm of ethics. For example, Harsanyi proved utilitarianism (though without actually defining welfare) under some (pretty reasonable) assumptions and Morgenstern and von Neumann proved consequentialism under another set of very reasonable assumptions. Unfortunately, somebody has yet to come up with an axiomatization of moral intuitions that is convincing to more than just a few people…


Adrian Hutter made me aware of an error in an earlier version of this post.