A Non-Comprehensive List of Human Values

Human values are said to be complex (cf. Stewart-Williams 2015, section “Morality Is a Mess”; Muehlhauser and Helm 2012, ch. 3, 4, 5.3). As evidence, the following is a non-comprehensive list of things that many people care about:

Abundance, art, asceticism, autarky, authority, autonomy, beauty, benevolence, challenge, community, competence, competitiveness, complexity, cooperation, creativity, crime, critical thinking, curiosity, democracy, dignity, diligence, diversity, duties, emotion, equality, excellence, excitement, experience, fairness, faithfulness, family, free will, freedom, friendship, frugality, fulfillment, fun, gender differences, gender equality, happiness, health, honesty, humbleness, idealism, improvement, intelligence, justice, knowledge, law abidance, life, love, loyalty, modesty, nature, novelty, openness, optimism, organization, pain, parsimony, peace, privacy, progress, promises, prosperity, purity, rationality, religion, respect, rights, sadness, safety, sanctity, self-determination, simplicity, sincerity, society, spirituality, stability, striving, suffering, surprise, technology, tolerance, tradition, variety, veracity, welfare, truth.

Note that from the inside, most of these values feel distinct from each other. Also, note that most of these do not feel instrumental to each other. For instance, people often want to find out the truth even when that truth is not useful for, e.g., reducing suffering or preserving tradition.

Some (articles with) lists that helped me to compile this list are Keith‑Spiegel’s moral characteristics list, moral foundations theory, Your Dictionary’s Examples of Morals, Eliezer Yudkowsky’s 31 laws of fun, table A1 in Bain et al.’s Collective Futures and Peter Levine’s an alternative to Moral Foundations Theory.

“Betting on the Past” by Arif Ahmed

[This post assumes knowledge of decision theory, as discussed in Eliezer Yudkowsky’s Timeless Decision Theory and in Arbital’s Introduction to Logical Decision Theory.]

I recently discovered an interesting thought experiment, “Betting on the Past” by Cambridge philosopher Arif Ahmed. It can be found in his book Evidence, Decision and Causality, which is an elaborate defense of Evidential Decision Theory (EDT). I believe that Betting on the Past may be used to money-pump non-EDT agents, refuting Causal Decision Theories (CDT), and potentially even ones that use logical conditioning, such as Timeless Decision Theory (TDT) or Updateless Decision Theory (UDT). At the very least, non-EDT decision theories are unlikely to win this bet. Moreover, no conspicuous perfect predicting powers, genetic influences, or manipulations of decision algorithms are required to make Betting on the Past work, and anyone can replicate the game at home. For these reasons, it might make a more compelling case in favor of EDT than the Coin Flip Creation, a problem I recently proposed in an attempt to defend EDT’s answers in medical Newcomb problems. In Ahmed’s thought experiment, Alice faces the following decision problem:

Betting on the Past: In my pocket (says Bob) I have a slip of paper on which is written a proposition P. You must choose between two bets. Bet 1 is a bet on P at 10:1 for a stake of one dollar. Bet 2 is a bet on P at 1:10 for a stake of ten dollars. So your pay-offs are as in [Figure 1]. Before you choose whether to take Bet 1 or Bet 2 I should tell you what P is. It is the proposition that the past state of the world was such as to cause you now to take Bet 2. [Ahmed 2014, p. 120]

Ahmed goes on to specify that Alice could indicate which bet she’ll take by either raising or lowering her hand. One can find a detailed discussion of the thought experiment’s implications, as well as a formal analysis of CDT’s and EDT’s decisions in Ahmed’s book. In the following, I want to outline a few key points.

Would CDT win in this problem? Alice is betting on a past state of the world. She can’t causally influence the past, and she’s uncertain whether the proposition is true or not. In either case, Bet 1 strictly dominates Bet 2: no matter which state the past is in, Bet 1 always yields a higher utility. For these reasons, causal decision theories would take Bet 1. Nevertheless, as soon as Alice comes to a definite decision, she updates on whether the proposition is true or false. If she’s a causal agent, she then finds out that she has lost: the past state of the world was such as to cause her to take Bet 1, so the proposition is false. If she had taken Bet 2, she would have found out that the proposition was correct, and she would have won, albeit a smaller amount than if she had won with Bet 1.

Betting on the Past seems to qualify as a kind of Newcomb’s paradox; it seems to have an equivalent payoff matrix (Figure 1).

Figure 1: Betting on the past has a similar payoff matrix to Newcomb’s paradox

P is true P is false
 Take Bet 1 10 -1
 Take Bet 2 1 -10

Furthermore, its causal structure seems to resemble those of e.g. the Smoking Lesion or Solomon’s problem, indicating it as a kind of medical Newcomb problem. In medical Newcomb problems, a “Nature” node determines both the present state of the world (whether the agent is sick/will win the bet) and the agent’s decision (see Figure 2). In this regard, they differ from Newcomb’s original problem, where said node refers to the agent’s decision algorithm.

Figure 2: Betting on the past (left) has a similar causal structure to medical Newcomb problems (right).

One could object to Betting on the Past being a medical Newcomb problem, since the outcomes conditional on our actions here are certain, while e.g. in the Smoking Lesion, observing our actions only shifts our probabilities in degrees. I believe this shouldn’t make a crucial difference. On the one hand, we can conceive of absolutely certain medical Newcomb cases like the Coin Flip Creation. On the other hand, Newcomb’s original problem is often formalized with absolute certainties as well. I’d be surprised if probabilistic vs. certain reasoning would make a difference to decision theories. First, we can always approximate certainties to an arbitrarily high degree. We might ask ourselves why a negligible further increase in certainty would at some point suddenly completely change the recommended action, then. Secondly, we’re never really certain in the real world anyway, so if the two cases would be different, this would render all thought experiments useless that use absolute certainties.

If Betting on the Past is indeed a kind of medical Newcomb problem, this would be an interesting conclusion. It would follow that if one prefers Bet 2, one should also one-box in medical Newcomb problems. And taking Bet 2 seems so obviously correct! I point this out because one-boxing in medical Newcomb problems is what EDT would do, and it is often put forward as both a counterexample to EDT and as the decision problem that separates EDT from Logical Decision Theories (LDT), such as TDT or UDT. (See e.g. Yudkowsky 2010, p.67)

Before we examine the case for EDT further, let’s take a closer look at what LDTs would do in Betting on the Past. As far as I understand, LDTs would take correlations with other decision algorithms into account, but they would ignore “retrocausality” (i.e. smoke in the smoker’s lesion, chew gum in the chewing gum problem, etc.). If there is a purely physical cause, then this causal node isn’t altered in the logical counterfactuals that an LDT agent reasons over. Perhaps if the bet was about the state of the world yesterday, LDT would still take Bet 2. Clearly, LDT’s algorithm already existed yesterday, and it can influence this algorithm’s output; so if it chooses Bet 2, it can change yesterday’s world and make the proposition true. But at some point, this reasoning has to break down. If we choose a more distant point in the past as a reference for Alice’s bet – maybe as far back as the birth of our universe – she’ll eventually be unable to exert any possible influence via logical counterfactuals. At some point, the correlation becomes a purely physical one. All she can do at that point is what opponents of evidential reasoning would call “managing the news” (Lewis, 1981) – she can merely try to go for the action that gives her the best Bayesian update.

So, do Logical Decision Theories get it wrong? I’m not sure about that; they come in different versions, and some haven’t yet been properly formalized, so it’s hard for me to judge. I can very well imagine that e.g. Proof-Based Decision Theory would take Bet 2, since it could prove P to be either true or false, contingent on the action it would take. I would argue, though, that if a decision theory takes Bet 2 – and if I’m right about Betting on the Past being a medical Newcomb problem – then it appears it would also have to “one-box”, i.e. take the option recommended by EDT, in other medical Newcomb problems.

If all of this is true, it might imply that we don’t really need LDT’s logical conditioning and that EDT’s simple Bayesian conditioning on actions could suffice. The only remaining difference between LDT and EDT would then be EDT’s lack of updatelessness. What would an updateless version of EDT look like? Some progress on this front has already been made by Everitt, Leike, and Hutter 2015. Caspar Oesterheld and I hope to be able to say more about it soon ourselves.

A Better Framing of Newcomb’s Problem

While I disagree with James M. Joyce on the correct solution to Newcomb’s problem, I agree with him that the standard framing of Newcomb’s problem (from Nozick 1969) can be improved upon. Indeed, I very much prefer the framing he gives in chapter 5.1 of The Foundations of Causal Decision Theory, which (according to Joyce) is originally due to JH Sobel:

Suppose there is a brilliant (and very rich) psychologist who knows you so well that he can predict your choices with a high degree of accuracy. One Monday as you are on the way to the bank he stops you, holds out a thousand dollar bill, and says: “You may take this if you like, but I must warn you that there is a catch. This past Friday I made a prediction about what your decision would be. I deposited $1,000,000 into your bank account on that day if I thought you would refuse my offer, but I deposited nothing if I thought you would accept. The money is already either in the bank or not, and nothing you now do can change the fact. Do you want the extra $1,000?” You have seen the psychologist carry out this experiment on two hundred people, one hundred of whom took the cash and one hundred of whom did not, and he correctly forecast all but one choice. There is no magic in this. He does not, for instance, have a crystal ball that allows him to “foresee” what you choose. All his predictions were made solely on the basis of knowledge of facts about the history of the world up to Friday. He may know that you have a gene that predetermines your choice, or he may base his conclusions on a detailed study of your childhood, your responses to Rorschach tests, or whatever. The main point is that you now have no causal influence over what he did on Friday; his prediction is a fixed part of the fabric of the past. Do you want the money?

I prefer this over the standard framing because people can remember the offer and the balance of their bank account better than box 1 and box 2. For some reason, I also find it easier to explain this thought experiments without referring to the thought experiment itself in the middle of the explanation. So, now whenever I describe Newcomb’s problem, I start with Sobel’s rather than Nozick’s version.

Of course, someone who wants to explore decision theory more deeply also needs to learn about the standard version, if only because people sometimes use “one-boxing” and “two-boxing” (the options in Newcomb’s original problem) to denote the analogous choices in other thought experiments. (Even if there are no boxes in these other thought experiments!) But luckily it does not take more than a few sentences to describe the original Newcomb problem based on Sobel’s version. You only need to explain that Newcomb’s problem replaces your bank account with an opaque box whose content you always keep; and puts the offer into a second, transparent box. And then the question is whether you stick with one box or go home with both.

Peter Thiel on Startup Culture

I recently read Peter Thiel’s Zero to One. All in all, it is an informative read. I found parts of ch. 10 on startup culture particularly interesting. Here’s the section “What’s under Silicon Valley’s Hoodies”:

Unlike people on the East Coast, who all wear the same skinny jeans or pinstripe suits depending on their industry, young people in Mountain View and Palo Alto go to work wearing T-shirts. It’s a chliché that tech workers don’t care about what they wear, but if you look closely at those T-shirts, you’ll see the logos of the wearers’ companies—and tech workers care about those very much. What makes a startup employee instantly distinguishable to outsiders is the branded T-shirt or hoodie that makes him look the same as his co-workers. The startup uniform encapsulates a simple but essential principle: everyone at your company should be different in the same way—a tribe of like-minded people fiercely devoted to the company’s mission.

Max Levchin, my co-founder at PayPal, says that statups should make their early staff as personally similar as possible. Startups have limited resources and small teams. They must work quickly and efficiently in order to survive, and that’s easier to do when everyone shares an understanding of the world. The early PayPal team worked well together because we were all the same kind of nerd. We all loved science ficion: Cryptonomicon was required reading, and we preferred the capitalist Star Wars to the communist Star Trek. Most important, we were all obsessed with creating a digital currency that would be controlled by individuals instead of governments. For the company to work, it didn’t matter what people looked like or which country they came from, but we needed every new hire to be equally obsessed.

In the section “Of cults and consultants” of the same chapter, he goes on:

In the most intense kind of organization, members hang out only with other members. They ignore their families and abandon the outside world. In exchange, they experience strong feelings of belonging, and maybe get access to esoteric “truths” denied to ordinary people. We have a word for such organizations: cults. Cultures of total dedication look crazy from the outside, partly because the most notorious cults were homicidal: Jim Jones and Charles Manson did not make good exits.

But entrepeneurs should take cultures of extreme dedication seriosuly. Is a lukewarm attitude to one’s work a sign of mental health? Is a merely professional attitude the only sane approach? The extreme opposite of a cult is a consulting firm like Accenture: not only does it lack a distinctive mission of its own, but individual consultants are regularly dropping in and out of companies to which they have no long-term connection whatsover.

[…]

The best startups might be considered slightly less extreme kinds of cults. The biggest difference is that cults tend to be fanatically wrong about something important. People at a successful startup are fanatically right about something those outside it have missed. You’re not going to learn those kinds of secrets from consultants, and you don’t need to worry if your company doesn’t make sense to conventional professionals. Better to be called a cult—or even a mafia.

Is it a bias or just a preference? An interesting issue in preference idealization

When taking others’ preferences into account, we will often want to idealize them rather than taking them too literally. Consider the following example. You hold a glass of transparent liquid in your hand. A woman walks by, says that she is very thirsty and would like to drink from your glass. What she doesn’t know, however, is that the water in the glass is (for some reason not relevant to this example) poisoned. Should you allow her to drink? Most people would say you should not. While she does desire to drink out of the glass, this desire would probably disappear upon gaining knowledge of its content. Therefore, one might say that her object-level preference is to drink from the glass, while her idealized preference would be not to drink from it. There is not too much literature on preference idealization, as far as I know, but, if you’re not already familiar with it, anyway, consider looking into “Coherent Extrapolated Volition“.

Preference idealization is not always as easy as inferring that someone doesn’t want to drink poison, and in this post, I will discuss a particular sub-problem: accounting for cognitive biases, i.e. systematic mistakes in our thinking, as they pertain to our moral judgments. However, the line between biases and genuine moral judgments is sometimes not clear.

Specifically, we look at cognitive biases that people exhibited in non-moral decisions, where their status as a bias to be corrected is much less controversial, but which can explain certain ethical intuitions. By offering such an error theory of a moral intuition, i.e. an explanation for how people could erroneously come to such a judgment, the intuition is called into question. Defendants of the intuition can respond that even if the bias can be used to explain the genesis of that moral judgment, they would nonetheless stick with that moral intuition. After all, the existence of all our moral positions can be explained by non-moral facts about the world – “explaining is not explaining away”. Consider the following examples.

Omission bias: People judge consequences of inaction as less severe than those of action. Again, this is clearly a bias in some cases, especially non-moral ones. For example, losing $1,000 by not responding to your bank in time is just as bad as losing $1,000 by throwing them out of the window. A business person who judges the two equivalent losses equally will ceteris paribus be more successful. Nonetheless, most people distinguish between act and omission in cases like the fat man trolley problem.

Scope neglect: The scope or size of something often has little or no effect on people’s thinking when it should have. For example, when three groups of people were asked what they would pay for interventions that would affect 2,000, 20,000, or 200,000 birds, people were willing to pay roughly the same amount of money irrespective of the number of birds. While scope neglect seems clearly wrong in this (moral) decision, it is less clearly so in other areas. For example, is a flourishing posthuman civilization with 2 trillion inhabitants really twice as good as one with 1 trillion? It is not clear to me whether answering “no” should be regarded as a judgment clouded by scope neglect (caused, e.g., by our inability to imagine the two civilizations in question) or a moral judgment that is to be accepted.

Contrast effect (also see decoy effect, social comparison bias, Ariely on relativity, mere subtraction paradox, Less-is-better effect): Consider the following market of computer hard drives, from which you are to choose one.

Hard drive model Model 1 Model 2 Model 3 (decoy)
Price $80 $120 $130
Capacity 250GB 500GB 360GB

Generally, one wants to expend as little money as possible while maximizing capacity. In the absence of model 3, the decoy, people may be undecided between models 1 and 2. However, when model 3 is introduced into the market, it provides a new reference point. Model 2 is better than model 3 in all regards, which increases its attractiveness to people, even relative to model 1. That is, models 1 and 2 are judged by how they compare with model 3 rather than by their own features. The effect clearly exposes an instance of irrationality: the existence of model 3 doesn’t affect how model 1 compares with model 2. When applied to ethical evaluation, however, it calls into question a firmly held intrinsic moral preference for social equality and fairness. Proponents of fairness seem to assess a person’s situation by comparing it to that of Bill Gates rather than judging each person’s situation separately. Similar to how the overpriced decoy changes our evaluation of the other products, our judgments of a person’s well-being, wealth, status, etc. may be seen as irrationally depending on the well-being, wealth, status, etc. of others.

Other examples include peak-end rule/extension neglect/evaluation by moments and average utilitarianism; negativity bias and caring more about suffering than about happiness; psychological distance and person-affecting views; status-quo bias and various population ethical views (person-affecting views, the belief that most sentient beings that already exist have lives worth living); moral credential effect; appeal to nature and social Darwinism/normative evolutionary ethics.

Decision Theory and the Irrelevance of Impossible Outcomes

(This post assumes some knowledge of the decision theory of Newcomb-like scenarios.)

One problem in the decision theory of Newcomb-like scenarios (i.e. the study of whether causal, evidential or some other decision theory is true) is that even the seemingly obvious basics are fiercely debated. Newcomb’s problem seems to be fundamental and the solution obvious (to both sides), and yet scholars disagree about its resolution. If we already fail at the basics, how can we ever settle this debate?

In this post, I propose a solution. Specifically, I will introduce a very plausible general principle that decision rules should abide by. One may argue that settling on powerful general rules (like the one I will propose) must be harder than settling single examples (like Newcomb’s problem). However, this is not universally the case. Especially in decision theory, we should expect general principles to be especially convincing because a common defense of two-boxing in Newcomb’s scenario is that Newcomb’s problem is just a weird edge case in which rationality is punished. By introducing a general principle that CDT (or, perhaps, EDT) violates, we can prove the existence of a general flaw.

Without further ado, the principle is: The decisions we make should not depend on the utilities assigned to outcomes that are impossible to occur. To me this principle seems obvious and indeed it is consistent with expected value calculations in non-Newcomb-like scenarios: Imagine having to deterministically choose an action from some set A. (We will ignore mixed strategies.) The next state of the world is sampled from a set of states S via a distribution P and depends on the chosen action. We are also given a utility function U, which assigns values to pairs of a state and an action. Let a be an action and let s be a possible state. If P(s,a) = 0 (or P(s|a)=0 or P(s given the causal implications of a)=0 – we assume all of these to be the equivalent in this non-Newcomb-like scenario), then it doesn’t matter what U(s,a) is, because in an expected value calculation, U(s,a) will always be multiplied with P(s,a)=0. That is to say, any expected value decision rule gives the same outcome regardless of U(s,a). So, expected value decision rules abide by this principle at least in non-Newcomb-like scenarios.

Let us now apply the principle to a Newcomb-like scenario, specifically to the prisoner’s dilemma played against an exact copy of yourself. Your actions are C and D. Your opponent is the “environment” and can also choose between C (cooperation) and D (defection). So, the possible outcomes are (C,C), (C,D), (D,C) and (D,D). The probabilities P(C,D) and P(D,C) are both 0. Applied to this Newcomb-like scenario, the principle of the irrelevance of impossible alternatives states that our decision should only depend on the utilities of (C,C) and (D,D). Evidential decision theory behaves in accordance with this principle. (I leave it as an exercise to the reader to verify this.) Indeed, I suspect that it can be shown that EDT generally abides by the principle of the irrelevance of impossible outcomes. The choice of causal decision theory on the other hand does depend on the utilities of the impossible outcomes U(D,C) and U(C,D). Remember that in the prisoner’s dilemma the payoffs are such that U(D,x)>U(C,x) for any action x of the opponent, i.e. no matter the opponent’s choice it is always better to defect. This dominance is given as the justification for CDT’s decision to defect. But let us say we increase the utility of U(C,D) such that U(C,D)>U(D,D) and decrease the utility of U(D,C) such that U(D,C)>U(C,C). Of course, we must make these changes for the utility functions of both players so as to retain symmetry. After these changes, the dominance relationship is reversed: U(C,x)>U(D,x) for any action x. Of course, the new payoff matrix  is not that of a prisoner’s dilemma anymore – the game is different in important ways. But when played against a copy, these differences do not seem significant, because we only changed the utilities of outcomes that were impossible to achieve anyway. Nevertheless, CDT would switch from D to C upon being presented with these changes, thus violating the principle of the irrelevance of impossible outcomes. This is a systematic flaw in CDT: Its decisions depend on the utility of outcomes that it can already know to be impossible.

The principle of the irrelevance of impossible outcomes can be used beyond arguing against CDT. As you may remember from my post on updatelessness, sensible decision theories will precommit to give Omega the money in the counterfactual mugging thought experiment. (If you don’t remember or haven’t read that post in the first place, this is a good time to catch up, because the following thoughts are based on the ideas from the post.) Even EDT, which ignores the utility of impossible outcomes, would self-modify in this way. However, the decision theory resulting from such self-modification violates the principle of the irrelevance of impossible outcomes. Remember that in counterfactual mugging, you give in because this was a good idea to precommit to when you didn’t yet know how the coin came up. However, once you know that the coin came up the unfavorable way, the positive outcome, which gave you the motivation to precommit, has become impossible. Of course, you only give in to counterfactual mugging if the reward in this now impossible branch is sufficiently high. For example, there is no reason to precommit to give in if you lose money in both branches. This means that once you have become updateless, you violate the principle of the irrelevance of impossible outcomes: your decision in counterfactual mugging depends on the utility you assign to an outcome that cannot happen anymore.

Omoto and Snyder (1995) on motivations to volunteer

Omoto and Snyder (1995) is only a single study on volunteerism with a sample of 116 AIDS volunteers, but the results are quite interesting nonetheless. In a Snyder, Omoto and Crain (1999) they summarize:

Motivations [..] foreshadow the length of time that volunteers stay active (Omoto and Snyder, 1995). In one longitudinal study, volunteers who were more motivated […] when they began their work were more likely to still be active 2.5 years later. Interestingly, relatively self-focused motivations (i.e., personal development, understanding, esteem enhancement) were more predictive of volunteers’ duration of service than those that were more other-focused (i.e., values and beliefs, community concern). That is, volunteers remained active to the extent that they more strongly endorsed relatively self-focused motivations for their work. Other-focused motives, even though they may provide considerable impetus for people to become volunteers, may not sustain volunteers faced with the tough realities and personal costs of volunteering.

The study also has other interesting results. For instance, “90% of respondents expected to continue volunteering with the agency for at least another year. In actuality, 54% of the volunteers were still active 1 year later, whereas only 16% of them were still active 2.5 years later.” (Omoto and Snyder 1995, p. 677)

I haven’t looked into the literature much more but this seems to be exactly the kind of research one should turn to if I wanted to design a successful social movement.