UDT is “updateless” about its utility function

On March 28, 2018March 28, 2018 By Johannes TreutleinIn GeneralLeave a comment

Updateless decision theory (UDT) (or some variant thereof) seems to be widely accepted as the best current solution to decision theory by MIRI researchers and LessWrong users. In this short post, I outline one potential implication of being completely updateless. My intention is not to refute UDT, but to show that:

It is not clear how updateless one might want to be, as this could have unforeseen consequences.
If one endorses UDT, one should also endorse superrational cooperation on a very deep level.

My argument is simple, and draws on the idea of multiverse-wide superrational cooperation (MSR), which is a form of acausal trade between agents with correlated decision algorithms. Thinking about MSR instead of general acausal trade has the advantage that it seems conceptually easier, while the conclusions gained should hold in the general case as well. Nevertheless, I am very uncertain and expect the reality of acausal cooperation between AIs to look different from the picture I draw in this post.

Suppose humans have created a friendly AI with a CEV utility function and UDT as its decision theory. This version of UDT has solved the problem of logical counterfactuals and algorithmic correlation, and can readily spot any correlated agent in the world. Such an AI will be inclined to trade acausally with other agents—agents in parts of the world it does not have causal access to. This is, for instance, to achieve gains from comparative advantages given empirical circumstances, and to exploit diminishing marginal returns of pursuing any single value system at once.

For the trade implied by MSR, the AI does not have to simulate other agents and engage in some kind of Löbian bargain with them. Instead, the AI has to find out whether the agents’ decision algorithms are functionally equivalent to the AI’s decision algorithm, it has to find out about the agents’ utility functions, and it has to make sure the agents are in an empirical situation such that trade benefits both parties in expectation. (Of course, to do this, the AI might also have to perform a simulation.) The easiest trading step seems to be the one with all other agents using updateless decision theory and the same prior. In this context, it is possible to neglect many of the usual obstacles to acausal trade. These agents share everything except their utility function, so there will be little if any “friction”—as long as the compromise takes differences between utility functions into account, the correlation between the agents will be perfect. It would get more complicated if the versions of UDT diverged a bit, and if the priors were slightly different. (More on this later.) I assume here that the agents can find out about the other agents’ utility functions. Although these are logically determined by the prior, the agents might be logically uncertain, and calculating the distribution of utility functions of UDT agents might be computationally expensive. I will ignore this consideration here.

A possible approach to this trade is to effectively choose policies based on a weighted sum of the utility functions of all UDT agents in all the possible worlds contained in the AI’s prior (see Oesterheld 2017, section 2.8 for further details). Here, the weights will be assigned such that in expectation, all agents will have an incentive to pursue this sum of utility functions. It is not exactly clear how such weights will be calculated, but it is likely that all agents will adopt the same weights, and it seems clear that once this weighting is done based on the prior, it won’t change after finding out which of the possible worlds from the prior is actual (Oesterheld 2017, section 2.8.6). If all agents adopt the policy of always pursuing a sum of their utility functions, the expected marginal additional goal fulfillment for all AIs at any point in the future will be highest. The agents will act according to the “greatest good for the greatest number.” Any individual agent won’t know whether they will benefit in reality, but that is irrelevant from the updateless perspective. This becomes clear if we compare the situation to thought experiments like the Counterfactual Mugging. Even if in the actual world, the AI cannot benefit from engaging in the compromise, then it was still worth it from the prior viewpoint, since (given sufficient weight in the sum of utility functions) the AI would have stood to gain even more in another, non-actual world.

If the agents are also logically updatelessness, this reduces the information the weights of the agents’ utility functions are based on. There probably are many logical implications that could be drawn from an empirical prior and the utility functions about aspects of the trade—e.g., that the trade will benefit only the most common utility functions, that some values won’t be pursued by anyone in practice, etc.—that might be one logical implication step away from a logical prior. If the AI is logically updateless, it will always perform the action that it would have committed to before it got to know about these implications. Of course, logical updatelessness is an unresolved issue, and its implications for MSR will depend on possible solutions to the problem.

In conclusion, in order to implement the MSR compromise, the AI will start looking for other UDT agents in all possible (and, possibly, impossible) worlds in its prior. It will find out about their utility functions and calculate a weighted sum over all of them. This is what I mean by the statement that UDT is “updateless” about its utility function: no matter what utility function it starts out with, its own function might still have negligible weight in the goals the UDT AI will pursue in practice. At this point, it becomes clear that it really matters what this prior looks like. What is the distribution of the utility functions of all UDT agents given the universal prior? There might be worlds less complex than the world humans live in—for instance, a cellular automaton, such as Rule 110 or Game of Life, with a relatively simple initial state—which still contain UDT agents. Given that these worlds might have a higher prior probability than the human world, they might get a higher weight in the compromise utility function. The AI might end up maximizing the goal functions of the agents in the simplest worlds.

Is updating on your existence a sin?

One of the features of UDT is that it does not even condition the prior on the agent’s own existence—when evaluating policies, UDT also considers their implications in worlds that do not contain an instantiation of the agent, even though by the time the agent thinks its first thought, it can be sure that these worlds do not exist. This might not be a problem if one assigns high weight to a modal realism/Tegmark Level 4 universe anyway. An observation can never distinguish between a world in which all worlds exist, and one in which only the world featuring the current observation exists. So if the measure of all the “single worlds” is small, then updating on existence won’t change much.

Suppose that this is not the case. Then there might be many worlds that can already be excluded as non-actual based on the fact that they don’t contain humans. Nevertheless, they might contain UDT agents with alien goals. This poses a difficult choice: Given UDT’s prior, the AI will still cooperate with agents living in non-actual (and impossible, if the AI is logically updatelessness) worlds. This is because given UDT’s prior, it could have been not humans, but these alien agents, that turned out actual—in which case they could have benefited humans in return. On the other hand, if the AI is allowed to condition on such information, then it loses in a kind of counterfactual prisoner’s dilemma:

Counterfactual prisoner’s dilemma: Omega has gained total control over one universe. In the pursuit of philosophy, Omega flips a fair coin to determine which of two agents she should create. If the coin comes up heads, Omega will create a paperclip maximizer. If it comes up tails, she creates a perfectly identical agent, but with one difference: the agent is a staple maximizer. After the creation of these agents, Omega hands either of them total control over the universe and lets them know about this procedure. There are gains from trade: producing both paperclips and staples creates 60% utility for both of the agents, while producing only one of those creates 100% for one of the agents. Hence, both agents would (in expectation) benefit from a joint precommitment to a compromise utility function, even if only one of the agents is actually created. What should the created agent do?

If the agents condition on their existence, then they will not gain as much in expectation as they could otherwise expect to gain before the coin flip (when neither of the agents existed). I have chosen this thought experiment because it is not confounded by the involvement of simulated agents, a factor which could lead to anthropic uncertainty and hence make the agents more updateless than they would otherwise be.

UDT agents with differing priors

What about UDT agents using differing priors? For simplicity, I suppose there are only two agents. I also assume that both agents have equal capacity to create utilons in their universes. (If this is not the case, the weights in the resulting compromise utility function have to be adjusted.) Suppose both agents start out with the same prior, but update it on their own existence—i.e., they both exclude any worlds that don’t contain an instantiation of themselves. This posterior is then used to select policies. Agent B can’t benefit from any cooperative actions by agent A in a world that only exists in agent A’s posterior. Conversely, agent A also can’t benefit from agent B in worlds that agent A doesn’t think could be actual anymore. So the UDT policy will recommend pursuing a compromise function only in worlds lying in the intersection of worlds that exist in both agent’s posteriors. If either agent updates that they are in some of the worlds to which the other agent assigns approximately zero probability, then they won’t cooperate.

More generally, if both agents know which world is actual, and this is a world which they both inhabit, then it doesn’t matter which prior they used to select their policies. (Of course, this world must have nonzero probability in both of their priors; otherwise they wouldn’t ever update that said world is actual.) From the prior perspective, for agent A, every sacrificed utilon in this world is weighted by its prior measure of the world. Every gained utilon from agent B is also weighted by the same prior measure. So there is no friction in this compromise—if both agents decide between action a which gives themselves d utilons, and an action b which gives the other agent c utilons, then any agent will prefer option b iff c divided by this agent’s prior measure of the world is greater than d divided by the same prior measure, so iff c is greater than d. Given that there is a way to normalize both agents’ utility functions, pursuing a sum of those utility functions seems optimal.

We can even expand this to the case wherein the two agents have any differing priors with a nonempty intersection between the corresponding sets of possible worlds. In expectation, the policy that says: “if any world outside the intersection is actual: don’t compromise; if any world from the intersection is actual: do the standard UDT compromise, but use the posterior distribution in which all worlds outside the intersection have zero probability for policy selection” seems best. When evaluating this policy, both agents can weight both utilons sacrificed for others, as well as utilons gained from others, in any of the worlds from the intersection by the measure of the entire intersection in their own respective priors. This again creates a symmetrical situation with a 1:1 trade ratio between utilons sacrificed and gained.

Another case to consider is if the agents also distribute the relative weights between the worlds in the intersection differently. I think that this does not lead to asymmetries (in the sense that conditional on some of the worlds being actual, one agent stands to gain and lose more than the other agent). Suppose agent A has 30% on world S₁, and 20% on World S₂. Agent B, on the other hand, has 10% on world S₁ and 20% on world S₂. If both agents follow the policy of pursuing the sum of utility functions, given that they find themselves in either of the two shared worlds, then, ceteris paribus, both will in expectation benefit to an equal degree. For instance, let c₁ (c₂) be the amount of utilons either agent can create for the other agent in world S₁ (S₂), and d₁ (d₂) the respective amount agents can create for themselves. Then agent A gets either 0.3×c₁+0.2×c₂ or 0.3×d₁+0.2×d₂, while B chooses between 0.1×c₁+0.2×c₂ and 0.1×d₁+0.2×d₂. Here, it’s not the case that A prefers cooperating iff B prefers cooperating. But assuming that in expectation, c₁ = c₂ as well as d₁ = d₂, this leads to a situation where both prefer cooperation iff c₁ > d₁. It follows that just pursuing a sum of both agents’ utility functions is, in expectation, optimal for both agents.

Lastly, consider a combination of non-identical priors with empirical uncertainty. For UDT, empirical uncertainty between worlds translates into anthropic uncertainty about which of the possible worlds the agent inhabits. In this case, as expected, there is “friction”. For example, suppose agent A assigns p to the intersection of the worlds in both agents’ priors, while agent B assigns p/q. Before they find out whether one of the worlds from the intersection or some other world is actual, the situation is the following: B can benefit from A’s cooperation in only p/q of the worlds. A can benefit in p of the worlds from B, but for everything A does, this will only mean p/q as much to agent B. Now each agent can again either create d utilons for themselves, or perform a cooperative action that gives c utilons to the other agent in the world where the action is performed. Given uncertainty about which world is actual, if both agents choose cooperation, agent A receives c×p utilons in expectation, while agent B receives c×p/q utilons in expectation. Defection gives both agents d utilons. So for cooperation to be worth it, c×p and c×p/q both have to be greater than d. If this is the case, then if p is unequal to p/q, both agents’ gains from trade are still not equal. This appears to be a bargaining problem that doesn’t solve as easily as the examples from above.

Conclusion

I actually endorse the conclusion that humans should cooperate with all correlating agents. Although humans’ decision algorithms might not correlate with as many other agents, and they might not be able to compromise as efficiently as super-human AIs, humans should nevertheless pursue some multiverse-wide sum of values. What I’m uncertain about is how far updatelessness should go. For instance, it is not clear to me which empirical and logical evidence humans should and shouldn’t take into account when selecting policies. If an AI does not start out with the knowledge that humans possess but instead uses the universal prior, then it might perform actions that seem irrational given human knowledge. Even if observations are logically inconsistent with the existence of a fellow cooperation partner (i.e., in the updated distribution, the cooperation partner’s world has zero probability), then UDT might still cooperate with and possibly adopt that partner’s values. I doubt at this point whether everyone still agrees with the hypothesis that UDT always achieves the highest utility.

Acknowledgements

I thank Caspar Oesterheld, Max Daniel, Lukas Gloor, and David Althaus for helpful comments on a draft of this post, and Adrian Rorheim for copy editing.

Market efficiency and charity cost-effectiveness

On March 28, 2018March 26, 2020 By CasparIn GeneralLeave a comment

In an efficient market, one can expect that most goods are sold at a price-quality ratio that is hard to improve upon. If there was some easy way to produce a product cheaper or to produce a higher-quality version of it for a similar price, someone else would probably have seized that opportunity already – after all, there are many people who are interested in making money. Competing with and outperforming existing companies thus requires luck, genius or expertise. Also, if you trust other buyers to be reasonable, you can more or less blindly buy any “best-selling” product.

Several people, including effective altruists, have remarked that this is not true in the case of charities. Since most donors don’t systematically choose the most cost-effective charities, most donations go to charities that are much less cost-effective than the best ones. Thus, if you sit on a pile of resources – your career, say – outperforming the average charity at doing good is fairly easy.

The fact that charities don’t compete for cost-effectiveness doesn’t mean there’s no competition at all. Just like businesses in the private sector compete for customers, charities compete for donors. It just happens to be the case that being good at convincing people to donate doesn’t correlate strongly with cost-effectiveness.

Note that in the private sector, too, there can be a misalignment between persuading customers and producing the kind of product you are interested in, or even the kind of product that customers in general will enjoy or benefit from using. Any example will be at least somewhat controversial, as it will suggest that buyers make suboptimal choices. Nevertheless, I think addictive drugs like cigarettes are an example that many people can agree with. Cigarettes seem to provide almost no benefits to consumers, at least relative to taking nicotine directly. Nevertheless, people buy them, perhaps because smoking is associated with being cool or because they are addictive.

One difference between competition in the for-profit and nonprofit sectors is that the latter lacks monetary incentives. It’s nearly impossible to become rich by founding or working at a charity. Thus, people primarily interested in money won’t start a charity, even if they have developed a method of persuading people of some idea that is much more effective than existing methods. However, making a charity succeed is still rewarded with status and (the belief in) having had an impact. So in terms of persuading people to donate, the charity “market” is probably somewhat efficient in areas that confer status and that potential founders and employees intrinsically care about.

If you care about investing your resource pile most efficiently, this efficiency at persuading donors offers little consolation. On the contrary, it even predicts that if you use your resources to found or support an especially cost-effective charity, fundraising will be difficult. Perhaps you previously thought that, since your charity is “better”, it will also receive more donations than existing ineffective charities. But now it seems that if cost-effectiveness really helped with fundraising, more charities would have already become more cost-effective.

There are, however, cause areas in which the argument about effectiveness at persuasion carries a different tone. In these cause areas, being good at fundraising strongly correlates with being good at what the charity is supposed to do. An obvious example is that of charities whose goal it is to fundraise for other charities, such as Raising for Effective Giving. (Disclosure: I work for REG’s sister organization FRI and am a board member of REG’s parent organization EAF.) If an organization is good at fundraising for itself, it’s probably also good at fundraising for others. So if there are already lots of organizations whose goal it is to fundraise for other organizations, one might expect that these organizations already do this job so well that they are hard to outperform in terms of money moved per resources spent. (Again, some of these may be better because they fundraise for charities that generate more value according to your moral view.)

Advocacy is another cause area in which successfully persuading donors correlates with doing a very good job overall. If an organization can persuade people to donate and volunteer to promote veganism, it seems plausible that they are also good at promoting veganism. Perhaps most of the organization’s budget even comes from people they persuaded to become vegan, in which case their ability to find donors and volunteers is a fairly direct measure of their ability to persuade people to adopt a vegan diet. (Note that I am, of course, not saying that competition ensures that organizations persuade people of the most useful ideas.) As with fundraising organizations, this suggests that it’s hard to outperform advocacy groups in areas where lots of people have incentives to advocate, because if there were some simple method of persuading people, it’s very likely that some large organization based on that method would have already been established.

That said, there are many caveats to this argument for a strong correlation between fundraising and advocacy effectiveness. First off, for many organizations, fundraising appears to be primarily about finding, retaining and escalating a small number of wealthy donors. For some organizations, a similar statement might be true about finding volunteers and employees. In contrast, the goal of most advocacy organizations is to persuade a large number of people.¹ So there may be organizations whose members are very persuasive in person and thus capable of bringing in many large donors, but who don’t have any idea about how to run a large-scale campaign oriented toward “the masses”. When trying to identify cost-effective advocacy charities, this problem can, perhaps, be addressed by giving some weight to the number of donations that a charity brings in, as opposed to donation sizes alone.² However, the more important point is that if growing big is about big donors, then a given charity’s incentives and selection pressures for survival and growth are misaligned with persuading many people. Thus, it becomes more plausible again that the average big or fast-growing advocacy-based charity is a suboptimal use of your resource pile.

Second, I stipulated that a good way of getting new donors and volunteers is to simply persuade as many people of your general message as possible, and then hope that some of these will also volunteer at or donate to your organization. But even if all donors contribute similar amounts, some target audiences are more likely to donate than others.³ In particular, people seem more likely to contribute larger amounts if they have been involved for longer, have already donated or volunteered, and/or hold a stronger or more radical version of your organization’s views. But persuading these community members to donate works in very different ways than persuading new people. For example, being visible to the community becomes more important. Also, if donating is about identity and self-expression, it becomes more important to advocate in ways that express the community’s shared identity rather than in ways that are persuasive but compromising. The target audiences for fundraising and advocacy may also vary a lot along other dimensions: for example, to win an election, a political party has to persuade undecided voters, who tend to be uninformed and not particularly interested in politics (see p. 312 of Achen and Bartel’s Democracy for Realists); but to collect donations, one has to mobilize long-term party members who probably read lots of news, etc.

Third, the fastest-growing advocacy organizations may have large negative externalities.⁴ Absent regulations and special taxes, the production of the cheapest products will often damage some public good, e.g., through carbon emissions or the corruption of public institutions. Similarly, advocacy charities may damage some public good. The fastest way to find new members may involve being overly controversial, dumbing down the message or being associated with existing powerful interests, which may damage the reputation of a movement. For example, the neoliberals often suffer from being associated with special/business interests and crony capitalism (see sections “Creating a natural constituency” and “Cooption” in Kerry Vaughan’s What the EA community can learn from the rise of the neoliberals), perhaps because associating with business interests often carries short-term benefits for an individual actor. Again, this suggests that the fastest-growing advocacy charity may be much worse overall than the optimal one.

Acknowledgements

I thank Jonas Vollmer, Persis Eskander and Johannes Treutlein for comments. This work was funded by the Foundational Research Institute (now the Center on Long-Term Risk).

^{1. Lobbying organizations, which try to persuade individual legislators, provide a useful contrast. Especially in countries with common law, organizations may also attempt to win individual legal cases. ↩}

^{2. One thing to keep in mind is that investing effort into persuading big donors is probably a good strategy for many organizations. Thus, a small-donor charity that grows less quickly than a big-donor charity may be be more or less cost-effective than the big-donor charity. ↩}

^{3. One of the reasons why one might think that drawing in new people is most effective is that people who are already in the community and willing to donate to an advocacy org probably just fund the charity that persuaded them in the first place. Of course, many people may simply not follow the sentiment of donating to the charity that persuaded them. However, many community members may have been persuaded in ways that don’t present such a default option. For example, many people were persuaded to go vegan by reading Animal Liberation. Since the book’s author, Peter Singer, has no room for more funding, these people have to find other animal advocacy organizations to donate to. ↩}

^{4. Thanks to Persis Eskander for bringing up this point in response to an early version of this post. ↩}

The law of effect, randomization and Newcomb’s problem

On February 15, 2018January 10, 2022 By CasparIn General2 Comments

[ETA (January 2022): My co-authors James Bell, Linda Linsefors and Joar Skalse and I give a much more detailed analysis of the dynamics discussed in this post in our paper titled “Reinforcement Learning in Newcomblike Environments”, published at NeurIPS 2021.]

The law of effect (LoE), as introduced on p. 244 of Thorndike’s (1911) Animal Intelligence, states:

Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond.

As I (and others) have pointed out elsewhere, an agent applying LoE would come to “one-box” (i.e., behave like evidential decision theory (EDT)) in Newcomb-like problems in which the payoff is eventually observed. For example, if you face Newcomb’s problem itself multiple times, then one-boxing will be associated with winning a million dollars and two-boxing with winning only a thousand dollars. (As noted in the linked note, this assumes that the different instances of Newcomb’s problem are independent. For instance, one-boxing in the first does not influence the prediction in the second. It is also assumed that CDT cannot precommit to one-boxing, e.g. because precommitment is impossible in general or because the predictions have been made long ago and thus cannot be causally influenced anymore.)

A caveat to this result is that with randomization one can derive more causal decision theory-like behavior from alternative versions of LoE. Imagine an agent that chooses probability distributions over actions, such as the distribution P with P(one-box)=0.8 and P(two-box)=0.2. The agent’s physical action is then sampled from that probability distribution. Furthermore, assume that the predictor in Newcomb’s problem can only predict the probability distribution and not the sampled action and that he fills box B with the probability the agent chooses for one-boxing. If this agent plays many instances of Newcomb’s problem, then she will ceteris paribus fare better in rounds in which she two-boxes. By LoE, she may therefore update toward two-boxing being the better option and consequently two-box with higher probability. Throughout the rest of this post, I will expound on the “goofiness” of this application of LoE.

Notice that this is not the only possible way to apply LoE. Indeed, the more natural way seems to be to apply LoE only to whatever entity the agent has the power to choose rather than something that is influenced by that choice. In this case, this is the probability distribution and not the action resulting from that probability distribution. Applied at the level of the probability distribution, LoE again leads to EDT. For example, in Newcomb’s problem the agent receives more money in rounds in which it chooses a higher probability of one-boxing. Let’s call this version of LoE “standard LoE”. We will call other versions, in which choice is updated to bring some other variable (in this case the physical action) to assume values that are associated with high payoffs, “non-standard LoE”.

Although non-standard LoE yields CDT-ish behavior in Newcomb’s problem, it can easily be criticized on causalist grounds. Consider a non-Newcomblike variant of Newcomb’s problem in which there is no predictor but merely an entity that reads the agent’s mind and fills box B with a million dollars in causal dependence on the probability distribution chosen by the agent. The causal graph representing this decision problem is given below with the subject of choice being marked red. Unless they are equipped with an incomplete model of the world – one that doesn’t include the probability distribution step –, CDT and EDT agree that one should choose the probability distribution over actions that one-boxes with probability 1 in this variant of Newcomb’s problem. After all, choosing that probability distribution causes the game master to see that you will probably one-box and thus also causes him to put money under box B. But if you play this alternative version of Newcomb’s problem and use LoE on the level of one- versus two-boxing, then you would converge on two-boxing because, again, you will fare better in rounds in which you happen to two-box.

Be it in Newcomb’s original problem or in this variant of Newcomb’s problem, non-standard LoE can lead to learning processes that don’t seem to match LoE’s “spirit”. When you apply standard LoE (and probably also in most cases of applying non-standard LoE), you develop a tendency to exhibit rewarded choices, and this will lead to more reward in the future. But if you adjust your choices with some intermediate variable in mind, you may get worse and worse. For instance, in either the regular or non-Newcomblike Newcomb’s problem, non-standard LoE adjusts the choice (the probability distribution over actions) so that the (physically implemented) action is more likely to be the one associated with higher reward (two-boxing), but the choice itself (high probability of two-boxing) will be one that is associated with low rewards. Thus, learning according to non-standard LoE can lead to decreasing rewards (in both Newcomblike and non-Newcomblike problems).

All in all, what I call non-standard LoE looks a bit like a hack rather than some systematic, sound version of CDT learning.

As a side note, the sensitivity to the details of how LoE is set up relative to randomization shows that the decision theory (CDT versus EDT versus something else) implied by some agent design can sometimes be very fragile. I originally thought that there would generally be some correspondence between agent designs and decision theories, such that changing the decision theory implemented by an agent usually requires large-scale changes to the agent’s architecture. But switching from standard LoE to non-standard LoE is an example where what seems like a relatively small change can significantly change the resulting behavior in Newcomb-like problems. Randomization in decision markets is another such example. (And the Gödel machine is yet another example, albeit one that seems less relevant in practice.)

Acknowledgements

I thank Lukas Gloor, Tobias Baumann and Max Daniel for advance comments. This work was funded by the Foundational Research Institute (now the Center on Long-Term Risk).

Pearl on causality

On February 13, 2018 By CasparIn GeneralLeave a comment

Here’s a quote by Judea Pearl (from p. 419f. of the Epilogue of the second edition of Causality) that, in light of his other writing on the topic, I found surprising when I first read it:

Let us examine how the surgery interpretation resolves Russell’s enigma concerning the clash between the directionality of causal relations and the symmetry of physical equations. The equations of physics are indeed symmetrical, but when we compare the phrases “A causes B” versus “B causes A,” we are not talking about a single set of equations. Rather, we are comparing two world models, represented by two different sets of equations: one in which the equation for A is surgically removed; the other where the equation for B is removed. Russell would probably stop us at this point and ask: “How can you talk about two world models when in fact there is only one world model, given by all the equations of physics put together?” The answer is: yes. If you wish to include the entire universe in the model, causality disappears because interventions disappear – the manipulator and the manipulated lose their distinction. However, scientists rarely consider the entirety of the universe as an object of investigation. In most cases the scientist carves a piece from the universe and proclaims that piece in – namely, the focus of investigation. The rest of the universe is then considered out or background and is summarized by what we call boundary conditions. This choice of ins and outs creates asymmetry in the way we look at things, and it is this asymmetry that permits us to talk about “outside intervention” and hence about causality and cause-effect directionality.

Futarchy implements evidential decision theory

On December 18, 2017March 26, 2020 By CasparIn General11 Comments

Futarchy is a meta-algorithm for making decisions using a given set of traders. For every possible action a, the beliefs of these traders are aggregated using a prediction market for that action, which, if a is actually taken, evaluates to an amount of money that is proportional to how much utility is received. If a is not taken, the market is not evaluated, all trades are reverted, and everyone keeps their original assets. The idea is that – after some learning and after bad traders lose most of their money to competent ones – the market price for a will come to represent the expected utility of taking that action. Futarchy then takes the action whose market price is highest.

For a more detailed description, see, e.g., Hanson’s (2007) original paper on the futarchy, which also discusses potential objections. For instance, what happens in markets for actions that are very unlikely to be chosen? Note, however, that for this blog post you’ll only need to understand the basic concept and none of the minutia of real-world implementation. The above description deliberately ignores and abstracts away from these. One example of such a discrepancy between standard descriptions of futarchy and my above account is that, in real-world governance, there is often a “default action” (such as, leave law and government as is). To keep the number of markets small, markets are set up to evaluate proposed changes relative to that default (such as the introduction of a new law) rather than simply for all possible actions. I should also note that I only know basic economics and am not an expert on the futarchy.

Traditionally, the futarchy has been thought of as a decision-making procedure for governance of human organizations. But in principle, AIs could be built on futarchies as well. Of course, many approaches to AI (such as most Deep Learning-based ones) already have all their knowledge concentrated into a single entity and thus don’t need any procedure (such as democracy’s voting or futarchy’s markets) to aggregate the beliefs of multiple entities. However, it has also been proposed that intelligence arises from the interaction and sometimes competition of a large number of simple subagents – see, for instance, Minsky’s book The Society of Mind, Dennett’s Consciousness Explained, and the modularity of mind hypothesis. Prediction markets and futarchies would be approaches to (or models of) combining the opinions of many of these agents, though I doubt that the human mind functions like either of the two. A theoretical example of the use of prediction markets in AI is MIRI’s logical induction paper. Furthermore, markets are generally similar to evolutionary algorithms.¹

So, if we implement a futarchy-like system in an AI, what decision theory would that AI come to implement? It seems that the answer is EDT. Consider Newcomb’s problem as an example. Traders that predict one-boxing to yield a million and two-boxing to yield a thousand will earn money, since the agent will, in fact, receive a million if it one-boxes and a thousand if it two-boxes. More generally, the futarchy rewards traders based on how accurately they predict what is actually going to happen if the agent makes a particular choice. This leads the traders to estimate the value of an action as proportional to the expected utility conditional on that action since conditional probabilities are the correct way to make predictions.

There are some caveats, though. For instance, prediction markets only work if the question at hand can eventually be answered. Otherwise, the market cannot be evaluated. For instance, in Newcomb’s problem, one would usually assume that your winnings are eventually given and thus shown to you. But other versions of Newcomb’s problems are conceivable. For instance, if you are consequentialist, Omega could donate your winnings to your favorite charity in such a way that you will never be able to tell how much utility this has generated for you. Unless you simply make estimates – in which case the behavior of the markets depends primarily on what kind of expected value (regular or causal) you will use as an estimate –, you cannot set up a prediction market for this problem at all. An example of such a “hidden” Newcomb problem is cooperation via correlated decision making between distant agents.

Another unaddressed issue is whether the futarchy can deal correctly with other problems of space-time embedded intelligence, such as the BPB problem.

Notwithstanding the caveats, EDT seems to be an inherent the way the futarchy works. To get the futarchy to implement CDT, it would have to reward traders based on what the agent is causally responsible for or based on some untestable counterfactual (“what would have happened if I had two-boxed”). Whereas EDT arises naturally from the principles of the futarchy, other decision theories require modification and explicit specification.

I should mention that this post is not primarily intended as a futarchist argument for EDT. Most readers will already be familiar with the underlying pro-EDT argument, i.e., EDT making decisions based on what will actually happen if a particular decision is made. In fact, it may also be viewed as a causalist argument against the futarchy.²Rather than either of these two, it is a small part of the answer to the “implementation problem of decision theory”, which is: if you want to create an AI that behaves in accordance to some particular decision theory, how should that AI be designed? Or, conversely, if you build an AI without explicitly implementing a specific decision theory, what kind of behavior (EDT or CDT or other) results from it?

Acknowledgment: This work was funded by the Foundational Research Institute (now the Center on Long-Term Risk).

^{1. There is some literature comparing the way markets function to evolution-like selection (see the first section of Blume and Easley 1992) – i.e., how irrational traders are weeded out and rational traders accrue more and more capital. I haven’t read much of that literature, but the main differences between the futarchy and evolutionary algorithms seem to be the following. First, the futarchy doesn’t specify how new traders are generated, because it classically relies on humans to do the betting (and the creation of new automated trading systems), whereas this is a central concern in evolutionary algorithms. Second, futarchies permanently leave the power in the hands of many algorithms, whereas evolutionary algorithms eventually settle for one. This also means that the individual traders in a futarchy can be permanently narrow and specialized. For instance, there could be traders who exploit a single pattern and rarely bet at all. I wonder whether it makes sense to combine evolutionary algorithms and prediction markets. ↩}

^{2. Probably futarchist governments wouldn’t face sufficiently many Newcomb-like situations in which the payoff can be tested for the difference to be relevant (see chapter 4 of Arif Ahmed’s Evidence, Decision and Causality). ↩}

A behaviorist approach to building phenomenological bridges

On October 22, 2017March 26, 2020 By CasparIn General11 Comments

A few weeks ago, I wrote about the BPB problem and how it poses a problem for classical/non-logical decision theories. In my post, I briefly mentioned a behaviorist approach to BPB, only to immediately discard it:

One might think that one could map between physical processes and algorithms on a pragmatic or functional basis. That is, one could say that a physical process A implements a program p to the extent that the results of A correlate with the output of p. I think this idea goes into the right direction and we will later see an implementation of this pragmatic approach that does away with naturalized induction. However, it feels inappropriate as a solution to BPB. The main problem is that two processes can correlate in their output without having similar subjective experiences. For instance, it is easy to show that Merge sort and Insertion sort have the same output for any given input, even though they have very different “subjective experiences”.

Since writing the post I became more optimistic about this approach because the counterarguments I mentioned aren’t particularly persuasive. The core of the idea is the following: Let A and B be parameterless algorithms¹. We’ll say that A and B are equivalent if we believe that A outputs x iff B outputs x. In the context of BPB, your current decision is an algorithm A and we’ll say B is an instance or implementation of A/you iff A and B are equivalent. In the following sections, I will discuss this approach in more detail.

You still need interpretations

The definition only solves one part of the BPB problem: specifying equivalence between algorithms. This would solve BPB if all agents were bots (rather than parts of a bot or collections of bots) in Soares and Fallenstein’s Botworld 1.0. But in a world without any Cartesian boundaries, one still has to map parts of the environment to parameterless algorithms. This could, for instance, be a function from histories of the world onto the output set of the algorithm. For example, if one’s set of possible world models is a set of cellular automata (CA) with various different initial conditions and one’s notion of an algorithm is something operating on natural numbers, then such an interpretation i would be a function from CA histories to the set of natural numbers. Relative to i, a CA with initial conditions contains an instance of algorithm A if A outputs x <=> i(H)=x, where H is a random variable representing the history created by that CA. So, intuitively, i is reading A’s output off from a description the world. For example, it may look at the physical signals sent by a robot’s microprocessor to a motor and convert these into the output alphabet of A. E.g., it may convert a signal that causes a robot’s wheels to spin to something like “forward”. Every interpretation i is a separate instance of A.

Joke interpretations

Since we still need interpretations, we still have the problem of “joke interpretations” (Drescher 2006, sect. 2.3; also see this Brian Tomasik essay and references therein). In particular, you could have an interpretation i that does most of the work, so that the equivalence of A and i(H) is the result of i rather than the CA doing something resembling A.

I don’t think it’s necessarily a problem that an EDT agent might optimize its action too much for the possibility of being a joke instantiation, because it gives all its copies in a world equal weight no matter which copy it believes to be. As an example, imagine that there is a possible world in which joke interpretations lead to you to identify with a rock. If the rock’s “behavior” does have a significant influence on the world and the output of your algorithm correlates strongly with it, then I see no problem with taking the rock into account. At least, that is what EDT would do anyway if it has a regular copy in that world.² If the rock has little impact on the world, EDT wouldn’t care much about the possibility of being the rock. In fact, if the world also contains a strongly correlated non-instance³ of you that faces a real decision problem, then the rock joke interpretation would merely lead you to optimize for the action of that non-copy.

If you allow all joke interpretations, then you would view yourself in all worlds. Thus, the view may have similar implications as the l-zombie view where the joke interpretations serve as the l-zombies.⁴ Unless we’re trying to metaphysically justify the l-zombie view, this is not what we’re looking for. So, we may want to remove “joke interpretations” in some way. One idea could be to limit the interpretation’s computational power (Aaronson 2011, sect. 6). My understanding is that this is what people in CA theory use to define the notion of implementing an algorithm in a CA, see, e.g., Cook (2004, sect. 2). Another idea would be to include only interpretations that you yourself (or A itself) “can easily predict or understand”. Assuming that A doesn’t know its own output already, this means that i cannot do most of the work necessary to entangle A with i(H). (For a similar point, cf. Bishop 2004, sect. “Objection 1: Hofstadter, ‘This is not science’”.) For example, if i would just compute A without looking at H, then A couldn’t predict i very well if it cannot predict itself. If, on the other hand, i reads off the result of A from a computer screen in H, then A would be able to predict i’s behavior for every instance of H. Brian Tomasik lists a few more criteria to judge interpretations by.

Introspective discernibility

In my original rejection of the behaviorist approach, I made an argument about two sorting algorithms which always compute the same result but have different “subjective experiences”. I assumed that a similar problem could occur when comparing two equivalent decision-making procedures with different subjective experiences. But now I actually think that the behaviorist approach nicely aligns with what one might call introspective discernibility of experiences.

Let’s say I’m an agent that has, as a component, a sorting algorithm. Now, a world model may contain an agent that is just like me except that it uses a different sorting algorithm. Does that agent count as an instantiation of me? Well, that depends on whether I can introspectively discern which sorting algorithm I use. If I can, then I could let my output depend on the content of the sorting algorithm. And if I do that, then the equivalence between me and that other agent breaks. E.g., if I decide to output an explanation of my sorting algorithm, then my output would explain, say, bubble sort, whereas the other algorithm’s output would explain, say, merge sort. If, on the other hand, I don’t have introspective access to my sorting algorithm, then the code of the sorting algorithm cannot affect my output. Thus, the behaviorist view would interpret the other agent as an instantiation of me (as long as, of course, it, too, doesn’t have introspective access to its sorting algorithm). This conforms with the intuition that which kind of sorting algorithm I use is not part of my subjective experience. I find this natural relation to introspective discernibility very appealing.

That said, things are complicated by the equivalence relation being subjective. If you already know what A and B output, then they are equivalent if their output is the same — even if it is “coincidentally” so, i.e., if they perform completely unrelated computations. Of course, a decision algorithm will rarely know its own output in advance. So, this extreme case is probably rare. However, it is plausible that an algorithm’s knowledge about its own behavior excludes some conditional policies. For example, consider a case like Conitzer’s (2016, 2017), in which copies of an EU-maximizing agent face different but symmetric information. Depending on what the agent knows about its algorithm, it may view all the copies as equivalent or not. If it has relatively little self-knowledge, it could reason that if it lets its action depend on the information, the copies’ behavior would diverge. With more self-knowledge, on the other hand, it could reason that, because it is an EU maximizer and because the copies are in symmetric situations, its action will be the same no matter the information received.⁵

Consciousness

The BPB problem resembles the problem of consciousness: the question “does some physical system implement my algorithm?” is similar to the question “does some physical system have the conscious experience that I am having?”. For now, I don’t want to go too much into the relation between the two problems. But if we suppose that the two problems are connected, we can draw from the philosophy of mind to discuss our approach to BPB.

In particular, I expect that a common objection to the behaviorist approach will be that most instantiations in the behaviorist sense are behavioral p-zombies. That is, their output behavior is equivalent to the algorithm’s but they compute the output in a different way, and in particular in a way that doesn’t seem to give rise to conscious (or subjective) experiences. While the behaviorist view may lead us to identify with such a p-zombie, we can be certain, so the argument goes, that we are not given that we have conscious experiences.

Some particular examples include:

Lookup table-based agents
Messed up causal structures, e.g. Paul Durham’s experiments with his whole brain emulation in Greg Egan’s novel Permutation City.

I personally don’t find these arguments particularly convincing because I favor Dennett’s and Brian Tomasik’s eliminativist view on consciousness. That said, it’s not clear whether eliminativism would imply anything other than relativism/anti-realism for the BPB problem (if we view BPB and philosophy of mind as sufficiently strongly related).

Acknowledgment

This work was funded by the Foundational Research Institute (now the Center on Long-Term Risk).

^{1. I use the word “algorithm” in a very broad sense. I don’t mean to imply Turing computability. In fact, I think any explicit formal specification of the form “f()=…” should work for the purpose of the present definition. Perhaps, even implicit specifications of the output would work. ↩}

^{2. Of course, I see how someone would find this counterintuitive. However, I suspect that this is primarily because the rock example triggers absurdity heuristics and because it is hard to imagine a situation in which you believe that your decision algorithm is strongly correlated with whether, say, some rock causes an avalanche. ↩}

^{3. Although the behaviorist view defines the instance-of-me property via correlation, there can still be correlated physical subsystems that are not viewed as an instance of me. In particular, if you strongly limit the set of allowed interpretations (see the next paragraph), then the potential relationship between your own and the system’s action may be too complicated to be expressed as A outputs x <=> i(H)=x. ↩}

^{4. I suspect that the two might differ in medical or “common cause” Newcomb-like problems like the coin flip creation problem. ↩}

^{5. If this is undesirable, one may try to use logical counterfactuals to find out whether B also “would have” done the same as A if A had behaved differently. However, I’m very skeptical of logical counterfactuals in general. Cf. the “Counterfactual Robustness” section in Tomasik’s post. ↩}

Multiverse-wide cooperation via correlated decision making – Summary

On September 21, 2017January 6, 2018 By CasparIn General2 Comments

This is a short summary of some of the main points from my paper on multiverse-wide superrationality. For details, caveats and justifications, see the full paper. For shorter, accessible introductions, see here.

The target audience for this post consists of:

people who have already thought about the topic and thus don’t want to read through the long explanations given in the paper;
people who have already read (some of) the full paper and just want to refresh their memory;
people who don’t yet know whether they should read the full paper and thus want to know whether the content is interesting or relevant to them.

If you are not in any of these groups, this post may be confusing and not very helpful for understanding the main ideas.

Main idea

Take values of agents with your decision algorithm into account to make it more likely that they do the same. I’ll use Hofstadter’s (1983) term superrationality to refer to this kind of cooperation.
Whereas acausal trade as it is usually understood seems to require mutual simulation and is thus hard to get right as a human, superrationality is easy to apply for humans (if they know how they can benefit agents that use the same decision algorithm).
Superrationality may not be relevant among agents on Earth, e.g. because on Earth we already have causal cooperation and few people use the same decision algorithm as we use. But if we think that we might live in a vast universe or multiverse (as seems to be a common view among physicists, see, e.g., Tegmark (2003)), then there are (potentially infinitely) many agents with whom we could cooperate in the above way.
This multiverse-wide superrationality (MSR) suggests that when deciding between policies in our part of the multiverse, we should essentially adopt a new utility function (or, more generally, a new set of preferences) which takes into account the preferences of all agents with our decision algorithm. I will call that our compromise utility function (CUF). Whatever CUF we adopt, the others will (be more likely to) adopt a structurally similar CUF. E.g., if our CUF gives more weight to our values, then the others’ CUF will also give more weight to their values. The gains from trade appear to be highest if everyone adopts the same CUF. If this is the case, multiverse-wide superrationality has strong implications for what decisions we should make.

The superrationality mechanism

Superrationality works without reciprocity. For example, imagine there is one agent for every integer and that for every i, agent i can benefit agent i+1 at low cost to herself. If all the agents use the same decision algorithm, then agent i should benefit agent i+1 to make it more likely that agent i-1 also cooperates in the same way. That is, agent i should give something to an agent that cannot in any way return the favor. This means that when cooperating superrationally, you don’t need to identify which agents can help you.
How should the new criterion for making decisions, our compromise utility function, look like?
- Harsanyi’s (1955) aggregation theorem suggests that it should be a weighted sum of the utility functions of all the participating agents.
- To maximize gains from trade, everyone should adopt the same weights.
- Variance-voting (Cotton-Barratt 2013; MacAskill 2014, ch. 3) is a promising candidate.
- If some of the values require coordination (e.g., if one of the agents wants there to be at least one proof of the Riemann conjecture in the multiverse), then things get more complicated.
“Updatelessness” has some implications. E.g., it means that one should, under certain conditions, accept a superrational compromise that is bad for oneself.

The values of the other agents

To maximize the compromise utility function, it is very useful (though not strictly necessary, see section “Interventions”) to know what other agents with similar decision algorithms care about.
The orthogonality thesis (Bostrom 2012) implies that the values of the other agents are probably different from ours, which means that taking them into account makes a difference.
Not all aspects of the values of agents with our decision algorithm are relevant:
- Only the consequentialist parts of their values matter (though things like minimizing the number of rule violations committed by all agents is a perfectly fine consequentialist value system).
- Only values that apply to our part of the multiverse are relevant. (Some agents may care exclusively or primarily about their part of the multiverse.)
- At least humans care differently about far away than about near things. Because we are far away from most agents with our decision algorithm, we only need to think about what they care about in distant things.
- Superrationalists may care more about their idealized values, so we may try to idealize their values. However, we should be very careful to idealize only in ways consistent with their meta-preferences. (Otherwise, your values may be mis-idealized.)
There are some ways to learn about what other superrational agents care about.
- The empirical approach: We can survey the relevant aspects of human values. The values of humans who take superrationality seriously are particularly relevant.
  - An example of relevant research is Bain et al.’s (2013) study on what people care about in future societies. They found that people put most weight on how warm, caring and benevolent members of these societies are. If we believe that construal level theory (see Trope and Liberman (2010) for an excellent summary) is roughly correct, then such results should carry over to evaluations of other psychologically distant societies. Although these results have been replicated a few times (Bain et al. 2012; Park et al. 2015; Judge and Wilson 2015; Bain et al. 2016), they are tentative and merely exemplify relevant research in this domain.
  - Another interesting data point is the values of the EA/LW/SSC/rationalist community, to my knowledge the only group of people who plausibly act on superrationality.
- The theoretical approach: We could think about the processes that affect the distribution of values in the multiverse.
  - Biological evolution
  - Cultural evolution (see, e.g., Henrich 2015)
  - Late great filters
    - For example, if a lot of civilizations self-destruct with weapons of mass destruction, then the compromise utility function may contain a lot more peaceful values than an analysis based on biological and cultural evolution suggests.
  - The transition to whole brain emulations (Hanson 2016)
  - The transition to de novo AI (Bostrom 2014)

Interventions

There are some general ways in which we can effectively increase our compromise utility function without knowing its exact content.
- Many meta-activities don’t require any such knowledge as long as we think that it can be acquired in the future. E.g., we could convince other people of MSR, do research on MSR, etc.
- Sometimes, very very small bits of knowledge suffice to identify promising interventions. For example, if we believe that the consequentialist parts of human values are a better approximation of the consequentialist parts of other agents’ values than non-consequentialist human values, then we should make people more consequentialist (without necessarily promoting any particular consequentialist morality).
- Another relevant point is that no matter how well we know the content of the compromise function, the argument in favor of maximizing it in our part of the universe is still just as valid. Thus, even if we know very little about its content, we should still do our best at maximizing it. (That said, we will often be better at maximizing the values of humans, in great part because we know and understand these values better.)
Meta-activities
- Further research
- Promoting multiverse-wide superrationality
Probably ensuring that superintelligent AIs have a decision theory that reasons correctly about superrationality is ultimately the most important intervention (although promoting multiverse-wide superrationality among humans can be instrumental for doing so).
There are some interventions in the moral advocacy space which align people’s preferences more with those of other superrational agents about our universe.
- Promoting consequentialism
  - This is also good because consequentialism enables cooperation with the agents in other parts of the multiverse.
- Promoting pluralism (e.g., convincing utilitarians to also take things other than welfare into account)
- Promoting concern for benevolence and warmth (or whatever other value is much stronger represented in high versus low construal preferences)
- Facilitating moral progress (i.e., presenting people with the arguments for both sides). Probably valuing preference idealization is more common than disvaluing it.
- Promoting multiverse-wide preference utilitarianism
Promoting causal cooperation

A survey of polls on Newcomb’s problem

On June 27, 2017March 26, 2020 By CasparIn General4 Comments

One classic story about Newcomb’s problem is that, at least initially, people one-box and two-box in roughly equal numbers (and that everyone is confident in their position). To find out whether this is true or what exact percentage of people would one-box I conducted a meta-survey of existing polls of people’s opinion on Newcomb’s problem.

The surveys I found are listed in the following table:

I deliberately included even surveys with tiny sample sizes to test whether the results from the larger sample size surveys are robust or whether they depend on the specifics of how they obtained the data. For example, the description of Newcomb’s problem in the Guardian survey contained a paragraph on why one should one-box (written by Arif Ahmed, author of Evidence, Decision and Causality) and a paragraph on why one should two-box (by David Edmonds). Perhaps the persuasiveness of these arguments influenced the result of the survey?

Looking at all the polls together, it seems that the picture is at least somewhat consistent. The two largest surveys of non-professionals both give one-boxing almost the same small edge. The other results diverge more, but some can be easily explained. For example, decision theory is a commonly discussed topic on LessWrong with some of the opinion leaders of the community (including founder Eliezer Yudkowsky) endorsing one-boxing. It is therefore not surprising that opinions on LessWrong have converged more than elsewhere. Considering the low sample sizes, the other smaller surveys of non-professionals also seem reasonably consistent with the impression that one-boxing is only slightly more common than two-boxing.

The surveys also show that, as has often been remarked on, there exists a significant difference between opinion among the general population / “amateur philosophers” and professional philosophers / decision theorists (though the consensus among decision theorists is not nearly as strong as on LessWrong).

Acknowledgment: This work was funded by the Foundational Research Institute (now the Center on Long-Term Risk).

Complications in evaluating neglectedness

On June 25, 2017March 26, 2020 By CasparIn General9 Comments

Neglectedness (or crowdedness) is a heuristic that effective altruists use to assess how much impact they could have in a specific cause area. It is usually combined with scale (a.k.a. importance) and tractability (a.k.a. solvability), which together are meant to approximate expected value. (In fact, under certain idealized definitions of the three factors, multiplying them is equivalent to expected value. However, this removes the heuristic nature of these factors and probably does not describe how people typically apply them.) For introductions and thoughts on the framework as well as neglectedness in particular see:

Benjamin Todd: A framework for strategically selecting a cause.
Paul Christiano: Neglectedness and impact.
80,000 hours: How to compare different global problems in terms of impact.
William MacAskill: Doing Good Better. Chapter 10.

One reason why the neglectedness heuristic and the framework in general are so popular is that they are much easier to apply than explicit cost-effectiveness or expected value calculations. In this post, I will argue that evaluating neglectedness (which may usually be seen as the most heuristic and easiest to evaluate part of the framework) is actually quite complicated. This is in part to make people more aware of issues that are sometimes not and often only implicitly taken into account. In some cases, it may also be an argument against using the heuristic at all. Presumably, most of the following considerations won’t surprise many practitioners. Nonetheless, it appears useful to write them down, which, to my knowledge, hasn’t been done before.

Neglectedness and diminishing returns

There are a few different definitions of neglectedness. For example, consider the following three:

“If we add more resources to the cause, we can expect more promising interventions to be carried out.” (source)
You care about a cause much more than the rest of society. (source)
“How many people, or dollars, are currently being dedicated to solving the problem?” (source)

The first one is quite close to expected value-type calculations and so it is quite clear why it is important. The second and third are more concrete and easier to measure but ultimately only relevant because they are proxies of the first. If society is already investing a lot into a cause, then the most promising interventions in that cause area are already taken up and only less effective ones remain.

Because the second and, even more so, the third are easier to measure, I expect that, in practice, most people use these two when they evaluate neglectedness. Incidentally, these definitions also fit the terms “neglectedness” and “crowdedness” much better. I will argue that neglectedness in the second and third sense has to be translated into neglectedness into the first sense and that this translation is difficult. Specifically, I will argue that the diminishing returns curves on which the connection between already invested resources and the value of the marginal dollar is based on can assume different scales and shapes that have to be taken into account.

A standard diminishing return curve may look roughly like this:

IMG_20170621_133952

The x-axis represents the amount of resources invested into some intervention or cause area, the y-axis represents the returns of that investment. The derivative of the returns (i.e., the marginal returns) decreases, potentially in inverse proportion to the cumulative investment.

Even if returns diminish in a way similar to that shape, there is still the question of the scale of that graph (not to be confused with the scale/importance of the cause area), i.e. whether values on the x-axis are in the thousands, millions or billions. In general, returns probably diminish slower in cause areas that are in some sense large and uniform. Take the global fight against malaria. Intervening in some areas is more effective than in others. For example, it is more effective in areas where malaria is more common, or where it is easier to, say, provide mosquito nets, etc. However, given how widespread malaria is (about 300 million cases in 2015), I would expect that there is a relatively large number of areas almost tied for the most effective places to fight malaria. Consequently, I would guess that once the most effective intervention is to distribute provide mosquito nets, even hundreds of millions do not diminish returns all that much.

Other interventions have much less room for funding and thus returns diminish much more quickly. For example, the returns of helping some specific person will usually diminish way before investing, say, a billion dollars.

If you judge neglectedness only based on the raw amount of resources invested into solving a problem (as suggested by 80,000 hours), then this may make small cause areas look a lot more promising than they actually are. Depending on the exact definitions, this remains the case if you combine neglectedness with scale and tractability. For example, consider the following two interventions:

The global fight against malaria.
The fight against malaria in some randomly selected subset of 1/100th of the global area or population.

The two should usually be roughly equally promising. (Perhaps 1 is a bit more promising because every intervention contained in 2 is also in 1. On the other hand, that would make “solve everything” hard to beat as an intervention. Of course, 2 can also be more or less promising if an unusual 1/100th is chosen.) But because the raw amount of resources invested into 1 is presumably 100 times as big as the amount of resources invested into 2, 2 would, on a naive view, be regarded as much more neglected than 1. The product of scale and tractability is the same in 1 and 2. (1 is a 100 times bigger problem, but solving it in its entirety is also roughly 100 times more difficult, though I presume that some definitions of the framework judge this differently. In general, it seems fine to move considerations out of neglectedness into tractability and scope as long as they are not double-counted or forgotten.) Thus, the overall product of the three is greater for 2, which appears to be wrong. If on the other hand, neglectedness denotes the extent to which returns have diminished (the first of the three definitions given at the beginning of this section), then the neglectedness of 1 and 2 will usually be roughly the same.

Besides the scale of the return curve, the shape can also vary. In fact, I think many interventions initially face increasing returns from learning/research, creating economies of scale, specialization within the cause area, etc. For example, in most cause areas, the first $10,000 are probably invested into prioritization, organizing, or (potentially symbolic) interventions that later turn out to be suboptimal. So, in practice return curves may actually look more like the following:

IMG_20170621_134248

This adds another piece of information (besides scale) that needs to be taken into account to translate the amount of invested resources into how much returns have diminished: how and when do returns start to diminish?

There are many other return curve shapes that may be less common but mess up the neglectedness framework more. For example, some projects produce some large amount of value if they succeed but produce close to no value if they fail. Thus, the (actual not expected) return curve for such projects may look like this:

IMG_20170621_134241

Examples may include developing vaccines, colonizing Mars or finding cause X.

If such a cause area is already relatively crowded according to the third (and second) sense, that may make them less “crowded” in the first sense. For example, if nobody had invested money into finding a vaccine against malaria (and you don’t expect others to invest money into it into the future either, see below) then this cause area is maximally neglected in the second and third sense. However, given how expensive clinical trials are, the marginal returns of donating a few thousand dollars into it are essentially zero. If on the other hand, others have already contributed enough money to get a research project off the ground at all, then the marginal returns are higher, because there is at least some chance that your money will enable a trial in which a vaccine is found. (Remember that we don’t know the exact shape of the return curve, so we don’t know when the successful trial is funded.)

I would like to emphasize that the point of this section is not so much that people apply neglectedness incorrectly by merely looking at the amount of resources invested into a cause and not thinking about implications in terms of diminishing returns at all. Instead, I suspect that most people implicitly translate into diminishing returns and take the kind of the project into account. However, it may be beneficial if people were more aware of this issue and how it makes evaluating neglectedness more difficult.

Future resources

When estimating the neglectedness of a cause, we need to take into account, not only people who are currently working on the problem (as a literal reading of 80,000 hours’ definition suggests), but also people who have worked on it in the past and future. If a lot of people have worked on a problem in the past, then this indicates that the low-hanging fruit has already been picked. Thus, even if nobody is working in the area anymore, marginal returns have probably diminished a lot. I can’t think of a good example where this is a decisive consideration because if an area has been given up on (such that there is a big difference between past and current attention), it will usually score low in tractability, anyway. Perhaps one example is the search for new ways to organize society, government and economy. Many resources are still invested into thinking about this topic, so even if we just consider resources invested today, it would not do well in terms of neglectedness. However, if we consider that people have thought about and “experimented” in this area for thousands of years, it appears to be even more crowded.

We also have to take future people and resources into account when evaluating neglectedness. Of course, future people cannot “take away” the most promising intervention in the way that current and past people can. However, their existence causes the top interventions to be performed anyway. For example, let’s say that there are 1000 equally costly possible interventions in an area, generating 1000, 999, 998, …, 1 “utils” (or lives saved, years of suffering averted, etc.), respectively. Each intervention can only be performed once. The best 100 interventions have already been taken away by past people. Thus, if you have money for one intervention, you can now only generate 900 utils. But if you know that future people will engage in 300 further interventions in that area, then whether you intervene or not actually only makes a difference of 600 utils. All interventions besides the one generating 600 utils would have been executed anyway. (In Why Charities Don’t Differ Astronomically in Cost-Effectiveness, Brian Tomasik makes a similar point.)

The number of future people who would counterfactually engage in some cause area is an important consideration in many cause areas considered by effective altruists. In general, if a cause area is neglected by current and past people, the possibility of future people engaging in an intervention creates a lot of variance in neglectedness evaluations. If recently 10 people started working on an area, then it is very uncertain how much attention it will have in the future. And if it will receive a lot more attention regardless of our effort, then the neglectedness score may change by a factor of 100. The future resources that will go into long-established (and thus already less neglected) cause areas, on the other hand, are easier to predict and can’t make as much of a difference.

One example where future people and resources are an important consideration is AI safety. People often state that AI safety is a highly neglected cause area, presumably under the assumption that this should be completely obvious given how few people currently work in the area. At least, it is rare that the possibility of future people going into AI safety is considered explicitly. Langan-Dathi even writes that “due to [AI safety] being a recent development it is also highly neglected.” I, on the other hand, would argue that being a recent development only makes a cause look highly neglected if one doesn’t consider future people. (Again, Brian makes almost the same point regarding AI safety.)

Overall, I think many questions in AI safety should nonetheless be regarded as relatively neglected because I think there is a good chance that future people won’t recognize them as important fast enough. That said, I think some AI safety problems will become relevant in regular AI capability research or near time applications (such as self-driving cars). For example, I expect that some of Amodei et al.’s (2016) “Concrete Problems in AI Safety” will be (or would have been) picked up, anyway. Research in these areas of AI safety is thus potentially less intrinsically valuable, although it may still have a lot of instrumental benefits that make them worthwhile to pursue.

My impression is that neglecting future people in evaluating neglectedness is more common than forgetting to translate from invested resources into diminishing marginal returns. Nonetheless, in the context of this post the point of this section is that considering future resources makes neglectedness more difficult to evaluate. Obviously, it is hard to foresee how many resources will be invested into a project in the future. Because the most promising areas will not have received a lot of attention, yet, the question of their neglectedness will be dominated by how much resources they will receive in the future. Thus, in the most important cases, neglectedness is hard to estimate.

What should count as “the same cause area”?

At least the operationalization of neglectedness involves estimating the amount of (past, current and future) resources invested into a cause area. But which resources count as going into the same cause area? For example, if the cause area is malaria, should you count people who work in global poverty as working in the same cause area?

Because the number of people working in an area is only relevant as a proxy for how much marginal returns have diminished, the answer seems to be: Count people (and resources) to the extent that their activities diminish the marginal returns in the cause area in question. Thus, resources invested into alleviating global poverty have to be taken into account, because if people’s income increases, this will allow them to take measures against malaria as well.

As another example, consider the cause area of advocating some moral view X (say effective altruism). If only a few people currently promote that view, then one may naively view advocating X as neglected. However, if neglectedness is intended to be a proxy for diminishing returns, then it seems that we also have to take into account moral advocates of other views. Because most people regularly engage in some form of moral advocacy (e.g., when they talk about morality with their friends and children), many people already hold moral views that our advocacy has to compete with. Thus, we may want to take these other moral advocates into account for evaluating neglectedness. That said, if we apply neglectedness together with tractability and scope, it seems reasonable to include such considerations in either tractability or neglectedness. (As Rob Wiblin remarks, the three factors blur heavily into each other. In particular, neglectedness can make an intervention more tractable. As Wiblin notes, we should take care not to double-count arguments. We also shouldn’t forget to count arguments at all, though.)

Acknowledgements

I am indebted to Tobias Baumann for valuable comments. I wrote this post while working for the Foundational Research Institute, which is now the Center on Long-Term Risk.

Summary of Achen and Bartel’s Democracy for Realists

On June 18, 2017June 18, 2017 By CasparIn General2 Comments

I just finished binge-reading Achen and Bartel’s great book Democracy for Realists and decided to write up a summary and a few comments to aid my memory and share some of the most interesting insights.

The folk theory of democracy

(Since chapter 1 contains little of interest besides giving a foretaste of later chapters, I will start with the content of chapter 2.) The “folk theory” of democracy is roughly the following:

Voters have a set of informed policy preferences (e.g., on abortion, social security, climate change, taxes, etc.) and vote for the candidate or party whose policy preferences most resemble their own (similar to how vote advice applications operate). That is, people vote based on the issues. Parties are then assumed to cater to the voters’ preferences to maximize their chance of getting elected. This way the people get what they want (as is guaranteed under certain theoretical assumptions, by the median voter theorem).

Achen and Bartel argue that this folk theory of democracy does not describe what is happening in real-world democracies:

Voters are often badly informed: “Michael Delli Carpini and Scott Keeter (1996) surveyed responses to hundreds of specific factual questions in U.S. opinion surveys over the preceding 50 years to provide an authoritative summary of What Americans Know about Politics and Why It Matters. In 1952, Delli Carpini and Keeter found, only 44% of Americans could name at least one branch of government. In 1972, only 22% knew something about Watergate. In 1985, only 59% knew whether their own state’s governor was a Democrat or a Republican. In 1986, only 49% knew which one nation in the world had used nuclear weapons (Delli Carpini and Keeter 1996, 70, 81, 74, 84). Delli Carpini and Keeter (1996, 270) concluded from these and scores of similar findings that ‘large numbers of American citizens are woefully underinformed and that overall levels of knowledge are modest at best.’” (p. 36f.)
- Interestingly, the increasing availability of information has done little to change this. “[I]t is striking how little seems to have changed in the decades since survey research began to shed systematic light on the nature of public opinion. Changes in the structure of the mass media have allowed people with an uncommon taste for public affairs to find an unprecedented quantity and variety of political news; but they have also allowed people with more typical tastes to abandon traditional newspapers and television news for round-the-clock sports, pet tricks, or pornography, producing an increase in the variance of political information levels but no change in the average level of political information (Baum and Kernell 1999; Prior 2007). Similarly, while formal education remains a strong predictor of individuals’ knowledge about politics, substantial increases in American educational attainment have produced little apparent increase in overall levels of political knowledge. When Delli Carpini and Keeter (1996, 17) compared responses to scores of factual questions asked repeatedly in opinion surveys over the past half century, they found that ‘the public’s level of political knowledge is little different today than it was fifty years ago.’” (p. 37)
- This lack of knowledge seems to matter for policy preferences – uninformed voters cannot use heuristics to mimic the choices of informed voters. “[S]ome scholars have […] asked whether uninformed citizens – using whatever ‘information shortcuts’ are available to them – manage to mimic the preferences and choices of better informed people. Alas, statistical analyses of the impact of political information on policy preferences have produced ample evidence of substantial divergences between the preferences of relatively uninformed and better informed citizens (Delli Carpini and Keeter 1996, chap. 6; Althaus 1998). Similarly, when ordinary people are exposed to intensive political education and conversation on specific policy issues, they often change their mind (Luskin, Fishkin, and Jowell 2002; Sturgis 2003). Parallel analyses of voting behavior have likewise found that uninformed citizens cast significantly different votes than those who were better informed. For example, Bartels (1996) estimated that actual vote choices fell about halfway between what they would have been if voters had been fully informed and what they would have been if everyone had picked candidates by flipping coins.” (p. 39f.)
- Wisdom of the crowd-type arguments often don’t apply in politics because the opinions of different people are often biased in the the same direction: “Optimism about the competence of democratic electorates has often been bolstered (at least among political scientists) by appeals to what Converse (1990) dubbed the ‘miracle of aggregation’ – an idea formalized by the Marquis de Condorcet more than 200 years ago and forcefully argued with empirical evidence by Benjamin Page and Robert Shapiro (1992). Condorcet demonstrated mathematically that if several jurors make independent judgments of a suspect’s guilt or innocence, a majority are quite likely to judge correctly even if every individual juror is only modestly more likely than chance to reach the correct conclusion.
  
  Applied to electoral politics, Condorcet’s logic suggests that the electorate as a whole may be much wiser than any individual voter. The crucial problem with this mathematically elegant argument is that it does not work very well in practice. Real voters’ errors are quite unlikely to be statistically independent, as Condorcet’s logic requires. When thousands or millions of voters misconstrue the same relevant fact or are swayed by the same vivid campaign ad, no amount of aggregation will produce the requisite miracle; individual voters’ ‘errors’ will not cancel out in the overall election outcome, especially when they are based on constricted flows of information (Page and Shapiro 1992, chaps. 5, 9). If an incumbent government censors or distorts information regarding foreign policy or national security, the resulting errors in citizens’ judgments obviously will not be random. Less obviously, even unintentional errors by politically neutral purveyors of information may significantly distort collective judgment, as when statistical agencies or the news media overstate or understate the strength of the economy in the run-up to an election (Hetherington 1996).” (p.40f.)
Voters don’t have many strong policy preferences.
- Their stated preferences are sensitive to framing effects. Some examples from p. 30f:
  “[E]xpressed political attitudes can be remarkably sensitive to seemingly innocuous variations in question wording or context. For example, 63% to 65% of Americans in the mid-1980s said that the federal government was spending too little on “assistance to the poor”; but only 20% to 25% said that it was spending too little on “welfare” (Rasinski 1989, 391). “Welfare” clearly had deeply negative connotations for many Americans, probably because it stimulated rather different mental images than “assistance to the poor” (Gilens 1999). Would additional federal spending in this domain have reflected the will of the majority, or not? We can suggest no sensible way to answer that question. […] [I]n three separate experiments conducted in the mid-1970s, almost half of Americans said they would “not allow” a communist to give a speech, while only about one-fourth said they would “forbid” him or her from doing so (Schuman and Presser 1981, 277). In the weeks leading up to the 1991 Gulf War, almost two-thirds of Americans were willing to “use military force,” but fewer than half were willing to “engage in combat,” and fewer than 30% were willing to “go to war” (Mueller 1994, 30).
- Many voters have no opinions on many current issues (p. 31f.).
- People’s policy preferences are remarkably inconsistent over time with correlations of just 0.3 to 0.5 between the stated policy preferences on two occasions that are two years apart.
Many voters don’t know the positions of the competing parties on the issues, which makes it hard for them to vote for a party based on their policy preferences (p. 32).
- Lau and Redlawsk (1997; 2006) “found that about 70% of voters, on average, chose the candidate who best matched their own expressed preferences.” (p. 40)
If one asks people to place their own policy positions and that of parties on a seven-point issue scale, then issue proximity and vote choice will correlate. But this can be explained by more than one set of causal relationships. Of course, the naive interpretation is that people form a policy opinion and learn about the candidates’ opinions independently. Based on those, they decide which party to vote for. But this model of policy-oriented evaluation is only one possible explanation of the observed correlation between perceived issue proximity and voting behavior. Another is persuasion: Voters already prefer some party, know that party’s policies and then adjust their opinions to better match that party’s opinion. The third is projection: People already know which party to vote for, have some opinions on policy but don’t actually know what the party stands for. They then project their policy positions onto those of the party. (p. 42) Achen and Bartels report on evidence showing that policy-oriented evaluation is only a small contributor to the correlation between perceived issue proximity and vote choices. (p. 42-45)
They argue that, empirically, elected candidates often don’t represent the median voter. (p. 45-49)
To my surprise, they use Arrow’s impossibility theorem to argue against the feasibility of fair preference aggregation (pp. 26ff.). (See here for a nice video introduction.) Somehow, I always had the impression that Arrow’s impossibility theorem wouldn’t make a difference in practice. (As Arrow himself said, “Most [voting] systems are not going to work badly all of the time. All I proved is that all can work badly at times.”)

A weaker form of the folk theory is that, while voters may not know specific issues well enough to have an opinion, they do have some ideological preference (such as liberalism or conservatism). But this fails for similar reasons:

“Converse […] scrutinized respondents’ answers to open-ended questions about political parties and candidates for evidence that they understood and spontaneously employed the ideological concepts at the core of elite political discourse. He found that about 3% of voters were clearly classiffiable as “ideologues,” with another 12% qualifying as “near-ideologues”; the vast majority of voters (and an even larger proportion of nonvoters) seemed to think about parties and candidates in terms of group interests or the “nature of the times,” or in ways that conveyed “no shred of policy significance whatever” (Converse 1964, 217–218; also Campbell et al. 1960, chap. 10).”
Correlations between different policy views are only modest. This itself is not necessarily a bad thing but evidence against ideological voting. (If people fell into distinct ideological groups like liberals, conservatives, etc., one would observe such correlations. E.g., one may expect strong correlations between positions on foreign and domestic policy given that there are such correlations among political parties.) (p. 32f.)
- This appears to conflict to some extent with how Haidt’s moral foundations theory characterizes the differences between liberals and conservatives. According to Haidt, conservatives form a cluster of people who care much more about loyalty, authority and sanctity than liberals. This predicts correlations between positions on topics in these domains, e.g. gay marriage and immigration (assuming that people’s loyalty, authority and sanctity intuitions tend to have similar content). However, it doesn’t seem to predict correlations between views on, say, aid to education and isolationism, which were the type of variables asked about in the study by Converse (1964) that Achen and Bartels refer to.
“Even in France, the presumed home of ideological politics, Converse and Pierce (1986, chap. 4) found that most voters did not understand political ‘left’ and ‘right.’ When citizens do understand the terms, they may still be uncertain or confused about where the parties stand on the left-right dimension (Butler and Stokes 1974, 323–337). Perhaps as a result, their partisan loyalties and issue preferences are often badly misaligned. In a 1968 survey in Italy, for example, 50% of those who identified with the right-wing Monarchist party took left-wing policy positions (Barnes 1971, 170). […] [C]areful recent studies have repeatedly turned up similar findings. For example, Elizabeth Zechmeister (2006, 162) found “striking, systematic differences … both within and across the countries” in the conceptions of “left” and “right” offered by elite private college students in Mexico and Argentina, while André Blais (personal communication) found half of German voters unable to place the party called “Die Linke” – the Left – on a left-right scale.” (p. 34f.)

Direct democracy

Chapter 3 discusses direct democracy. Besides making the point that everyone seems to believe that “more democracy” is a good thing (pp. 52-60, 70), they argue against a direct democracy version of the folk theory. In my view, the evidence presented in chapter 2 of the book (and the previous section of this summary) already provides strong reasons for skepticism and I think the best case against a direct democracy folk theory is based on arguments of this sort. In line with this view, Achen and Bartels re-iterate some of the arguments, e.g. that the average Joe often adopts other people’s policy preferences rather than making up his own mind (p. 73-76).

Most of the qualitatively new evidence presented in this section, on the other hand, seems quite weak to me. Much of it seems to be aimed at showing that direct democracy has yielded bad results. For example, based on the ratings of Arthur Schlesinger Jr., the Wall Street Journal, C-SPAN and Siena College, the introduction of primary elections hasn’t increased the quality of presidents (p. 66). As they concede themselves, the data set so small and the ratings of presidents contentious, so this evidence is not very strong at all. They also argue that direct democracy sometimes leads to transparently silly decisions, but the evidence seems essentially anecdotal to me.

Another interesting point of the section is that, in addition to potential ideological motives, politicians usually have strategic reasons to support the introduction of “more democratic” procedures:

[T]hroughout American history, debates about desirable democratic procedures have not been carried out in the abstract. They have always been entangled with struggles for substantive political advantage. In 1824, “politicos in all camps recognized” that the traditional congressional caucus system would probably nominate William Crawford; thus, “how people felt about the proper nominating method was correlated very highly indeed with which candidate they supported” (Ranney 1975, 66). In 1832, “America’s second great party reform was accomplished, not because the principle of nomination by delegate conventions won more adherents than the principle of nomination by legislative caucuses, but largely because the dominant factional interests … decided that national conventions would make things easier for them” (Ranney 1975, 69).

Similarly, Ranney (1975, 122) noted that the most influential champion of the direct primary, Robert La Follette, was inspired “to destroy boss rule at its very roots” when the Republican Party bosses of Wisconsin twice passed him over for the gubernatorial nomination. And in the early 1970s, George McGovern helped to engineer the Democratic Party’s new rules for delegate selection as cochair of the party’s McGovern-Fraser Commission, and “praised them repeatedly during his campaign for the 1972 nomination”; but less than a year later he advocated repealing some of the most significant rules changes. Asked why McGovern’s views had changed, “an aide said, ‘We were running for president then’” (Ranney 1975, 73–74).

I expect that this is a quite common phenomenon in deciding which decision process to use. E.g., when an organization decides which decision procedure to use (e.g., who will make the decision, what kind of evidence is accepted as valid), members of the organization might base a decision on these processes less on general principles (e.g., balance, avoidance of cognitive biases and groupthink) than on which decision process will yield the favored results in specific object-level decisions (e.g., who gets a raise, whether my prefered project is funded).

I guess processes that are instantiated for only a single decision are affected even more strongly by this problem. An example is deciding on how to do AI value loading, e.g. which idealization procedures to use.

The Retrospective Theory of Political Accountability

In chapter 4, Achen and Bartels discuss an attractive alternative to the folk theory: retrospective voting. On this view, voters decide not so much based on policy preferences but on how well the candidates or parties has performed in the past. For example, a president under which the economy improved may be re-elected. This theory is plausible as a descriptive theory for a number of reasons:

There is quite some empirical evidence that retrospective voting describes what voters are doing (ch. 5-7).
Retrospective voting, i.e. evaluating whether the passing term went well, is much easier than policy-based voting, i.e. deciding which candidate’s proposed policies will work better in the future (p. 91f.).

The retrospective theory also has some normative appeal:

It selects for good leaders (p. 98-100).
It incentivizes politicians to do what is best for the voters (p. 100-102).
To some extent it allows politicians to do what is best for the voters even if the voters disagree on what is best (p. 91).

While Achen and Bartels agree that retrospective voting is a large part of the descriptive picture, they also argue that, at least in the way it is implemented by real-world voters, “its implications for democracy are less unambiguously positive than existing literature tends to suggest”:

Proceeding on the theme of the ignorance of the electorate, voters’ evaluation of the past term and the current situation is unreliable (p. 92f.). For example, their perception of environmental threats does not correlate much with that of experts (p. 106), they think crime is increasing when it is in fact stable or decreasing (p. 107) and they cannot assess the state of the economy (p. 107f.).
- Media coverage, partisan bias, popular culture, etc. often shape people’s judgments (p. 107, 138-142).
Voters are unable to differentiate whether bad times are an incumbent’s fault or not (p. 93). Consequently, there is some evidence that incumbents tend to be punished for shark attacks, droughts and floods (ch. 5).
“The theories of retrospective voting we have considered assume that voters base their choices at the polls entirely on assessments of how much the incumbent party has contributed to their own or the nation’s well-being. However, when voters have their own ideas about good policy, sensible or not, they may be tempted to vote for candidates who share those ideas, as in the spatial model of voting discussed in chapter 2. In that case incumbent politicians may face a dilemma: should they implement the policies voters want or the policies that will turn out to contribute to voters’ welfare?” (p. 109, also see pp. 108-111)
- “[E]lected officials facing the issue of fluoridating drinking water in the 1950s and 1960s were significantly less likely to pander to their constituents’ ungrounded fears when longer terms gave them some protection from the “sudden breezes of passion” that Hamilton associated with public opinion.” (p. 110)
The electorate’s decisions are often based only on the most recent events, in particular the economic growth in the past year or so (cf. the peak-end rule). This not only makes their judgments worse than necessary (as they throw information away), it also sets the wrong incentives to the incumbent. Indeed, there is some evidence of a “political business cycle”, i.e. politicians attempting to maximize for growth, in particular growth of real income, in the last year of their term. (See chapter 6. Additional evidence is given in ch. 7.)
“Another way to examine the effectiveness of retrospective voting is to see what happens after each election. If we take seriously the notion that reelection hinges on economic competence, one implication is that we should expect to see more economic growth when the incumbent party is reelected than when it is dismissed by the voters. In the former case the incumbent party has presumably been retained because its past performance makes it a better than average bet to provide good economic management in the future. In the latter case the new administration is presumably an unknown quantity, a random draw from some underlying distribution of economic competence. A secondary implication of this logic is that future economic performance should be less variable when the incumbent party is retained, since reelected administrations are a truncated subsample of the underlying distribution of economic competence (the worst economic performers having presumably been weeded out at reelection time).” (p. 164) Based on a tiny sample (US presidential elections between 1948-2008), this does not seem to be the case. Of course, one could argue that the new administration often is not a random quantity – the parties in US presidential elections are almost always the same and the candidates have often proven themselves in previous political roles. In fact, the challenger may have a longer track record than the incumbent. For example, this may come to be the case in 2020.
Using a subset of the same tiny sample, they show that post-reelection economic growth is not a predictor of popular vote margin (p. 166-168). So, retrospective voting as current voters apply it doesn’t seem to work in selecting competent leaders. That said, and as Achen and Bartels acknowledge themselves (p. 168), the evidence they use is only very tentative.

Overall, the electorate’s evaluation of a candidate may be some indicator of how well they are going to perform in the future, but it is an imperfect and manipulable one.

Group loyalties and social identities

In addition to retrospective voting, Achen and Bartels tentatively propose that group loyalties and social identities play a big role for politics. Whereas the retrospection theory appears to be relatively well-studied, this new theory is much less worked out, yet (pp. 230f.).

It seems clear that vast parts of psychology and social psychology in particular – Achen and Bartels refer to ingroups and outgroups, Asch’s conformity experiments, cognitive dissonance, rationalization, etc. – should be a significant explanatory factor in political science. Indeed, Achen and Bartels start chapter 8 by stating that the relevance of social psychology for politics has been recognized by past generations of researchers (pp. 213-222), it only became unpopular when some theories that it was associated with failed (pp. 222-225).

Achen and Bartels discuss a few ways in which social groups, identities and loyalties influence voting behavior:

While voters’ retrospection focuses on the months leading up to the election, these short-term retrospections translate into the formation of long-term partisan loyalties. So, in a way, partisan loyalties are, in part, the cumulation of these short-term retrospections (p. 197-199).
Many people are loyal to one party (p. 233).
People adopt the political views of the groups they belong to or identify with (p. 219f., 222f., 246-, p. 314).
- People often adopt the party loyalties of their parents (p. 233f.).
- People adopt the views of their party (or project their views onto the party) (ch. 10). Party identification also influences one’s beliefs about factual matters. For example, when an opposing party is in office people judge the economy as worse (pp. 276-284).
People reject the political views of groups that they dislike (pp. 284-294).
People choose candidates based on what they perceive to be best for their group (p. 229).
Catholic voters (even one’s who rarely go to church) tend to prefer catholic candidates, even if the candidate emphasizes the separation of church and state (pp. 238-246).
If, say, Catholics discriminate against Jews, then Jews are much less likely to vote for a Catholic candidate or a party dominated by Catholics (p. 237f.).
Better-informed voters are often influenced more strongly by identity issues, presumably because they are more aware of them (pp. 284-294). For example, they are sometimes less likely than worse-informed voters to get the facts right (p. 283).
“When political candidates court the support of groups, they are judged in part on whether they can ‘speak our language.’ Small-business owners, union members, evangelical Christians, international corporations – each of these has a set of ongoing concerns and challenges, and a vocabulary for discussing them. Knowing those concerns, using that vocabulary, and making commitments to take them seriously is likely to be crucial for a politician to win their support (Fenno 1978).“

Unfortunately, I think that Achen and Bartels stretch the concept of identity-based voting a bit too much. The clearest example is their analysis of the case of abortion (pp. 258-266). Women tend to have more stable views on abortion than men. They are also more likely to leave the Republican party if they are pro-choice and less likely to assimilate their opinions to that of their party. Achen and Bartels’ explanation is that women’s vote is affected by their identifying as women. But I don’t see why it is necessary to bring the concept of identity into this. A much simpler explanation would be that voters are, to some extent, selfish and thus put more weight on the issues that are most relevant to them. If this counts as voting based on identity, is there any voting behavior that cannot be ascribed to identities?

I also find many of the explanations based on social identity unsatisfactory – they often don’t really explain a phenomenon. For example, Achen and Bartels argue that the partisan realignment of white southerners in the second half of the 20th century was not so much driven by racial policy issues but by white southern identity (pp. 246-258). But they don’t explain how white southern identity led people into the open arms of the Republicans. For example, was it that Republicans explicitly appealed to that identity? Or did southern opinion leaders change their mind based on policy issues?

Implications for democracy

Chapter 11 serves as a conclusion of the book. It summarizes some of the points made in earlier sections but also discusses the normative implications.

Unsurprisingly, Achen and Bartels argue against naive democratization:

[E]ffective democracy requires an appropriate balance between popular preferences and elite expertise. The point of reform should not simply be to maximize popular influence in the political process but to facilitate more effective popular influence. We need to learn to let political parties and political leaders do their jobs, too. Simple-minded attempts to thwart or control political elites through initiatives, direct primaries, and term limits will often be counterproductive. Far from empowering the citizenry, the plebiscitary implications of the folk theory have often damaged people’s real interests. (p. 303)

At the same time, they again point out that elite political judgment is often not much better than that of the worse-informed majority. In addition to being more aware of identity issues, the elites are a lot better at rationalizing, which makes them sound more rational, but often does not yield more rational opinions (p. 309-311).

Another interesting point they make is that it is usually the least-informed voters who decide who wins an election because the non-partisan swing voters tend to be relatively uninformed (p. 312, also p.32).

Achen and Bartels give some reasons why democracy might be better than its alternatives. I think the arguments, as given in the book, drastically vary in appeal, but here all five:

“[E]lections generally provide authoritative, widely accepted agreement about who shall rule. In the United States, for example, even the bitterly contested 2000 presidential election – which turned on a few hundred votes in a single state and a much-criticized five-to-four Supreme Court decision – was widely accepted as legitimate. A few Democratic partisans continued to grumble that the election had been “stolen”; but the winner, George W. Bush, took office without bloodshed, or even significant protest, and public attention quickly turned to other matters.” This makes sense, although it would have been interesting to test this argument empirically. I.e., is violent power struggle more or less prevalent in democracies than in other forms of government, such as hereditary monarchies? (I would guess that it is less prevalent in democracies.)
“[I]n well-functioning democratic systems, parties that win office are inevitably defeated at a subsequent election. They may be defeated more or less randomly, due to droughts, floods, or untimely economic slumps, but they are defeated nonetheless. Moreover, voters seem increasingly likely to reject the incumbent party the longer it has held office, reinforcing the tendency for governmental power to change hands. This turnover is a key indicator of democratic health and stability. It implies that no one group or coalition can become entrenched in power, unlike in dictatorships or one-party states where power is often exercised persistently by a single privileged segment of society. And because the losers in each election can reasonably expect the wheel of political fortune to turn in the not-too-distant future, they are more likely to accept the outcome than to take to the streets.” (p. 317) Here it is not so clear whether this constant change is a good thing. Having the same party, group or person rule for long stretches of time ensures stability and avoids friction between consecutive legislations. It also ensures that office is most of the time held by politicians with experience. Presumably, Achen and Bartels are right in judging high turnover as beneficial, but they have little evidence to back it up.
“[E]lectoral competition also provides some incentives for rulers at any given moment to tolerate opposition. The notion that citizens can oppose the incumbent rulers and organize to replace them, yet remain loyal to the nation, is fundamental both to real democracy and to social harmony.” (p. 317f.) This also seems non-obvious. Perhaps the monarchist could argue that only rulers who do not have to worry about losing their position can fruitfully engage with criticism. They also have less reason to get the press under their control (although, empirically, dictators usually use their power to limit the press in ways that democratic governments cannot).
“[A] long tradition in political theory stemming from John Stuart Mill (1861, chap. 3) has emphasized the potential benefits of democratic citizenship for the development of human character (Pateman 1970). Empirical scholarship focusing squarely on effects of this sort is scant, but it suggests that democratic political engagement may indeed have important implications for civic competence and other virtues (Finkel 1985; 1987; Campbell 2003; Mettler 2005). Thus, participation in democratic processes may contribute to better citizenship, producing both self-reinforcing improvements in ‘civic culture’ (Almond and Verba 1963) and broader contributions to human development.” (p. 318) This may be true, but it appears to be a relatively weak consideration. Perhaps, the monarchist could counter that doing away with elections saves people more time than the improvements in “civic culture” are worth. They may not be as virtuous, but maybe they can nonetheless spend more time with their family and friends or create more economic value.
“Finally, reelection-seeking politicians in well-functioning democracies will strive to avoid being caught violating consensual ethical norms in their society. As Key (1961a, 282) put it, public opinion in a democracy ‘establishes vague limits of permissiveness within which governmental action may occur without arousing a commotion.’ Thus, no president will strangle a kitten on the White House lawn in view of the television cameras. Easily managed governmental tasks will get taken care of, too. Chicago mayors will either get the snow cleared or be replaced, as Mayor Michael Bilandic learned in the winter of 1979. Openly taking bribes will generally be punished. When the causal chain is clear, the outcome is unambiguous, and the evaluation is widely shared, accountability will be enforced (Arnold 1990, chap. 3). So long as a free press can report dubious goings-on and a literate public can learn about them, politicians have strong incentives to avoid doing what is widely despised. Violations occur, of course, but they are expensive; removal from office is likely. By contrast, in dictatorships, moral or financial corruption is more common because public outrage has no obvious, organized outlet. This is a modest victory for political accountability.” (p. 318f.) Of the five reasons given, I find this one the most convincing. It basically states that retrospective voting and to some extent even the folk theory work, they just don’t work as well as one might naively imagine. So, real-world democracy doesn’t do a better job than a coin flip at representing people’s “real opinions” on controversial issues like abortion. Democracy does ensure, however, that important, universally agreed upon measures will be implemented.

In their last section, Achen and Bartels propose an idea for how to make governments more responsive to the interests of the people. Noting that elites have much more influence, they suggest that economic and social equality, as well as limitations on lobbying and campaign financing, could make governments more responsive to the preferences of the people. While plausibly helpful, these ideas are much more trite than the rest of the book.

General comments

Overall I recommend reading the book if you’re interested in the topic.
Since I don’t know the subject area particularly well, I read a few reviews of the book (Paris 2016; Schwennicke, Cohen, Roberts, Sabl, Mares, and Wright 2017; Malhotra 2016; Mann 2016; Cox 2017; Somin 2016). All of these seemed positive overall. Some even said that large parts of the book are more mainstream than the authors claim (which is a good thing in my book).
It’s quite Americentric. Sometimes an analysis of studies conducted in the US is followed by references to papers confirming the results in other countries, but often it is not. In many ways, politics in the US is different than in other countries, e.g. only two parties matter and the variability in wealth and education within the US is much bigger than in many other Western nations. This makes me unsure to which extent many of the results carry over to other countries. Often it is also an unnecessary limitation of sample sizes. E.g., one analysis (p. 165) relates whether the incumbent party was replaced to post-presidential-election income and GDP growth in the years 1948-2008 in the US. It seems hard to conclude all that much from 16 data points. Perhaps taking a look at other countries would have been a cheap way to increase the sample size. Because the book is not about the details of particular democratic systems, the book seems quite accessible to non-US American readers with only superficial knowledge of US politics and history.
It often gives a lot of detail on how empirical evidence was gathered and analyzed. E.g., the entire chapter seven is about how people’s voting behavior after the Great Depression – which is often explained by policy preferences (in the US related to Roosevelt’s New Deal) – can be explained well by retrospective voting.
I also feel like the book is somewhat balanced despite their view differing somewhat from the mainstream within political science. E.g., they often mention explicitly what the mainstream view is and refer to studies supporting that view. I also feel like they are relatively transparent about how reliable or tentative the empirical evidence for some parts of the book is.
A similar book is Jason Brennan’s Against Democracy, which I haven’t read. As suggested by the names, Against Democracy differs from Democracy for Realists in that it proposes epistocracy as an alternative form of government.

Acknowledgements

I thank Max Daniel and Stefan Torges for comments.

	Jesse Clifton on Decision Theory and the Irrele…
	Lukas Finnveden on Cooperative AI competitions wi…
	Caspar on Cooperative AI competitions wi…
	Lukas Finnveden on Cooperative AI competitions wi…
	Lukas Finnveden on Cooperative AI competitions wi…