I’ve written about the question of which decision theories describe the behavior of approaches to AI like the “Law of Effect”. In this post, I would like to discuss GOLEM, an architecture for a self-modifying artificial intelligence agent described by Ben Goertzel (2010; 2012). Goertzel calls it a “meta-architecture” because all of the intelligent work of the system is done by sub-programs that the architecture assumes as given, such as a program synthesis module (cf. Kaiser 2007).
Roughly, the top-level self-modification is done as follows. For any proposal for a (partial) self-modification, i.e. a new program to replace (part of) the current one, the “Predictor” module predicts how well that program would achieve the goal of the system. Another part of the system — the “Searcher” — then tries to find programs that the Predictor deems superior to the current program. So, at the top level, GOLEM chooses programs according to some form of expected value calculated by the Predictor. The first interesting decision-theoretical statement about GOLEM is therefore that it chooses policies — or, more precisely, programs — rather than individual actions. Thus, it would probably give the money in at least some versions of counterfactual mugging. This is not too surprising, because it is unclear on what basis one should choose individual actions when the effectiveness of an action depends on the agent’s decisions in other situations.
The next natural question to ask is, of course, what expected value (causal, evidential or other) the Predictor computes. Like the other aspects of GOLEM, the Predictor is subject to modification. Hence, we need to ask according to what criteria it is updated. The criterion is provided by the Tester, a “hard-wired program that estimates the quality of a candidate Predictor” based on “how well a Predictor would have performed in the past” (Goertzel 2010, p. 4). I take this to mean that the Predictor is judged based the extent to which it is able to predict the things that actually happened in the past. For instance, imagine that at some time in the past the GOLEM agent self-modified to a program that one-boxes in Newcomb’s problem. Later, the agent actually faced a Newcomb problem based on a prediction that was made before the agent self-modified into a one-boxer and won a million dollars. Then the Predictor should be able to predict that self-modifying to one-boxing in this case “yielded” getting a million dollar even though it did not do so causally. More generally, to maximize the score from the Tester, the Predictor has to compute regular (evidential) conditional probabilities and expected utilities. Hence, it seems that the EV computed by the Predictor is a regular EDT-ish one. This is not too surprising, either, because as we have seen before, it is much more common for learning algorithms to implement EDT, especially if they implement something which looks like the Law of Effect.
In conclusion, GOLEM learns to choose policy programs based on their EDT-expected value.
This post is based on a discussion with Linda Linsefors, Joar Skalse, and James Bell. I wrote this post while working for the Foundational Research Institute, which is now the Center on Long-Term Risk.