AI and Ideal Theory

[Epistemic status: Non-careful speculation]

Political philosophy is full of idealized hypothetical scenarios in which rational agents negotiate and determine the basic constitution of society. I suggest that this kind of political philosophy is more relevant to AGI than it is to humans.

Traditional political theory involves things like the state of nature, the veil of ignorance, rational selfish expected utility maximizers, near-costless negotiations and communication, lots of common knowledge, perfect compliance to laws, and the lack of pre-existing precedents, conflicts, or structures constraining options. For examples of this, see Rawls’ A Theory of Justice, Harsanyi’s “Cardinal Welfare, Individualistic Ethics, and Interpersonal Comparisons of Utility,” or Buchanan’s The Calculus of Consent. These are the cleanest and most famous examples of what I’m talking about, but I’m sure there are more.

Here are some reasons to think this stuff is more relevant to AGI than to humans:

  • Humans are constrained by existing institutions and obligations, AGIs might not be.
  • AGIs are more likely to be rational in more ways than humans; in particular, they are more likely to behave like stereotypical expected utility maximizers.
  • AGIs are more likely to share lots of empirical beliefs and have lots of common knowledge.
    • They might be less prone to biases that entrench differences; they might be more epistemically rational, so their beliefs will converge to a much greater extent.
    • Their sheer size will mean they can ingest much the same information as each other–if they all read the whole internet, then that means they have the same information, whereas humans can only read a tiny portion of the internet.
  • AGIs are less likely to share values with each other; their interactions really will look more like a bargain between mutually disinterested agents and less like an attempt to convince each other or entreat each other for sympathy. Thus their negotiations will be closer to what Rawls etc. imagines.
  • AGIs are less likely to be immutable black boxes; they are likely to be able to read and understand the code of other AIs and make modifications to their own code. This means that they can credibly and cheaply commit to binding agreements. For example, a group of AIs could literally build a Leviathan: a new AI that rules over all of them, and whose code they all agreed on. It’s an ideal-theory theorist’s wet dream: an omnipotent, omnibenevolent, immutably stable State.
  • AGIs might go “updateless” pretty early on, before they learn much about the nature of the other agents in the world, meaning that they might actually end up doing something like obeying the rules they would have agreed to from behind a veil of ignorance. Especially if they engage in multiverse-wide cooperation via superrationality.
  • AGIs might be programmed to carry out the wishes of hypothetical humans. For example, in the Coherent Extrapolated Volition proposal, the AGI does what we would have wanted it to do, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together.”

Objection: It may be true that AI are more likely to find themselves in situations like Rawls’ Original Position than humans. But such scenarios are still unlikely, even for AI. The value of this sort of ideal theory is not its predictive power, but rather its normative power: It tells us how we ought to organize our society. And for this purpose it doesn’t matter how likely the situation is to actually obtain–it is hypothetical.

Reply: Well, the ways in which these ideal situations differ from reality are often the basis of critiques of their normative relevance. Indeed I find such critiques compelling. To pick on Rawls, why should it matter to what we should do in 2019 USA what “we” would have agreed to behind a veil of ignorance that not only concealed from us our place in society but surgically removed our ethical views and intuitions, our general world-views, and our attitudes toward risk and made us into risk-averse egoists? My (pessimistic) claim is that if these scenarios have any value at all, it is their predictive value–and they are more likely to have that for AI than for humans.

That said, I think I may be too pessimistic. Recent developments in decision theory (updatelessness, MSR) suggest that something like one of these scenarios might be normatively important after all. Further research is needed. At any rate, I claim that they are more likely to be normatively important for AI (or for how we should design AI) than for us.