Updatelessness and Son of X

by Scott Garrabrant 3y4th Nov 2016No comments

5


The purpose of this post is to discuss the relationship between the concepts of Updatelessness and the "Son of" operator.

Making a decision theory that is reflectively stable is hard. Most agents would self-modify into a agent if given the chance. For example if a CDT agent knows that it is going to be put in a Newcomb's problem, it would precommit to one-box, "causing" Omega to predict that it one-boxes. We say "son of CDT" to refer to the agent that a CDT agent would self-modify into, and more generally "son of X" to refer to the agent that agent X would self-modify into.

Thinking about these "son of" agents is unsatisfying for a couple reasons. First, it is very opaque. There is an extra level of indirection, where you cant just directly reason about what agent X will do. Instead have to reason about what agent X will modify into, which gives you a new agent, which you probably understand much less than you understand agent X, and then you have to reason about what that new agent will do. Second, it is unmotivated. If you had a good reason to like Son of X, you would probably not be calling in Son of X. Important concepts get short names, and you probably don't have as many philosophical reasons to like Son of X as you have to like X.

Wei Dai's Updateless Decision Theory is perhaps our current best decision theory proposal. A UDT agent chooses a function from its possible observations to its actions, without taking into account its observations, and then applies that function.

The main problem with this proposal is in formalizing it in a logical uncertainty framework. Some of the observations that an agent makes are going to be logical observations, for example, an agent may observe the millionth digit of . Then it is not clear how an agent can not take the digit of into account in its calculation of the best policy. If we do not tell it the digit through the standard channel, it might still compute the digit while computing the best policy. As I said here, it is important to note logical updatelessness is about computations and complexity, not about what logical system you are in.

So what would true logical updatelessness look like? Well the agent would have to not update on computations. Since it is a computation itself, we cannot keep it independent from all computations, but we can restrict it to some small class of computations. The way we do this is by giving the updateless part of the decision theory finite computational bounds. Computational facts not computable within those bounds are still observed, but we do not take them into account when choosing a policy. Instead, we use our limited computation to choose a policy in the form of a function from how the more difficult computations turn out to actions.

The standard way to express a policy is a bunch of input/output pairs. However, since the inputs here are results of computations, this can equivalently be expressed by a single computation that gives an output. (To see the equivalence, note that we can write down a single computation which computes all the inputs and produces the corresponding output. Conversely, given a single computation, we can just supply the identity function of the output of that computation.) Thus, logical updatelessness consists of a severely resource bounded agent choosing what policy (In the form of a computation) it wants to run given more resources.

Under this model, it seems that whenever you have an agent collecting more computational resources over time, with the ability to rewrite itself, you get an updateless agent. The limited agent is choosing using its bounded resources what algorithm it wants to run to choose its output when it has collected more computational resources. The future version of the agent with more resources is the updateless version of the original agent, in that it is following the policy specified by the original agent before updating on all the computational facts. However, this is also exactly what we mean when we say that the later agent is the son of the bounded agent.

There is still a free parameter in Logical Updatelessness, which is what decision procedure the limited version uses to select its policy. This is also underspecified in standard UDT, but I believe it is often taken to be EDT. Thus, we have logically updateless versions of many decision policies, which I claim is actually pointing at the same thing as Son on those various policies (in an environment where computational resources are collected over time).

5