Knowledge is Freedom

Scott Garrabrant

[Epistemic Status: Type Error]

In this post, I try to build up an ontology around the following definition of knowledge:

To know something is to have the set of policies available to you closed under conditionals dependent on that thing.

You are an agent $G$ , and you are interacting with an environment $e$ in the set $E$ of all possible environments. For each environment $e$ , you select an action $a$ from the set $A$ of available actions. You thus implement a policy $p \in A^{E}$ . Let $P \subseteq A^{E}$ denote the set of policies that you could implement. (Note that $A^{E}$ is the space of functions from $E$ to $A$ .)

If you are confused about the word "could," that is okay; so am I.

A fact $(F, ϕ)$ about the enviornment can be viewed as a function $ϕ : E \to F$ that partitions the set of environments according to that fact. For example, for the fact "the sky is blue," we can think of $F$ as the set ${⊤, ⊥}$ and $ϕ$ as the function that sends worlds with a blue sky to the element $⊤$ and sends worlds without a blue sky to the element $⊥$ . One example of a fact is $(E, i d)$ which is the full specification of the environment.

A conditional policy can be formed out of other policies. To form a conditional on a fact $(F, ϕ)$ we start with a policy for each element of $F$ . We will let $c (f)$ denote the policy associated with $f \in F$ , so $c : F \to A^{E}$ . Given this fact and this collection of policies, we define the conditional policy $p_{c} : E \to A$ given by $e \mapsto c (ϕ (e)) (e)$ .

Conditional policies are like if statements in programming. Using the fact "the sky is blue" from above, we can let $k_{r}$ be the policy that pushes a red button regardless of its environment and let $k_{g}$ be a policy that pushes a green button regardless of its environment. If $c (⊤) = k_{r}$ and $c (⊥) = k_{g}$ , then $p_{c}$ is the policy that pushes the red button if the sky is blue, and pushes big green button otherwise.

Now, we are ready to define knowledge. If $P$ is the set of policies you could implement, then you know a fact $(F, ϕ)$ if $P$ is closed under conditional policies dependent on $F$ . (i.e. Whenever $c : F \to P$ , we have $p_{c} \in P$ .) Basically, we are just saying that your policy is allowed to break into different cases for different ways that the fact could go.

Self Reference

Now, let's consider what happens when an agent tries to know things about itself. For this, we will consider a naturalized agent, that is part of the environment. There is a fact $(A, a c t i o n)$ of the environment that says what action the agent takes, where $A$ is again the set of actions available to the agent, and $a c t i o n$ is a function from $E$ to $A$ that picks out what action the agent takes in that environment. Note that $a c t i o n$ is exactly the agent's policy, but we are thinking about it slightly differently.

So that things are not degenerate, let's assume that there are at least two possible actions $a$ and $b$ in $A$ , and that $P$ contains the constant policies $k_{a}$ and $k_{b}$ that ignore their evironment and always ouptut the same thing.

However, we can write down an explicit policy that the agent cannot implement: the policy where the agent takes action $b$ in environments in which it takes action $a$ , and takes action $a$ in environments in which it does not take action $a$ . The agent cannot implement this policy, since there are no consistant environments in which the agent is implementing this policy. (Again, I am confused by the coulds here, but I am assuming that the agent cannot take an inherently contradictory policy.)

This policy can be viewed as a conditional policy on the fact $(A, a c t i o n)$ . You can construct it as $p_{c}$ , where $c$ is the function that maps $a$ to $k_{b}$ and everything else to $k_{a}$ . The fact that this conditional policy cannot be in $P$ shows that the agent cannot by our definition know its own action.

Partial Knowledge

As seen above, there are limits to knowledge. This makes me want to aim lower and think about what types of partial knowledge can exist. Perhaps an agent can interact with a fact in nontrivial ways, while still not having complete knowledge defined above. Here, I will present various ways an agent can have partial knowledge of a fact.

In all of the below examples we will use a fact $({1, 2, 3, 4}, ϕ)$ about the environment that can take on $4$ states, an action that can take on four values $A = {a c, a d, b c, b d}$ , and we assume that the agent has access to the constant functions. Think about how all of these types of partial knowledge can be interpreted as changing the subet $P \subseteq A^{E}$ in some way.

Knowing a Coarser Fact: The agent could know a fact that has less detail than the original fact, for example the agent could know the parity of the fact above. This would mean that the agent can choose a policy to implement on worlds sent to $1$ or $3$ , and another policy to implement on worlds sent to $2$ or $4$ , but cannot necessarily use any more resolution.

Knowing a Logically Dependent Fact: The agent could, for example, know another fact $({1, 2, 3, 4, ⊥}, ϕ^{'})$ with the property that $ϕ^{'} (e) = ϕ (e)$ whenever $ϕ^{'} (e) \neq ⊥$ . The agent can safely do policies when it knows it is in states $1$ through $4$ , but it also might be in a state of uncertainty, and know the environment is $⊥$ .

Knowing a Probabilistically Dependent Fact: The agent could, for example, know another fact $({1, 2, 3, 4}, ϕ^{'})$ , which is almost the same as the original fact, but is wrong in some small number of environments. The agent cannot reliably implement functions dependent on the original fact, but can correlate its action with the original fact by using this proxy.

Learning a Fact Later in Time: Imagine the agent has to make two independent actions at two different times, and the agent learns the fact after the first action, but before the second. In the above example, the first letter of the action, $a$ or $b$ , is the first action, and the second letter, $c$ or $d$ , is the second action. The policies are closed under conditionals as long as the different policies in the conditional agree on the first action. This is particularly interesting because it shows how to think of an agent moving through time as a single timeless agent with partial knowledge of the things that it will learn.

Paying Actions to Learn a Fact: Similar to the above example, imagine that an agent will learn the fact, but only if it chooses $a$ in the first round. This corresponds to being closed under conditionals as long as all of the policies always choose $a$ in the first round.

Paying Internal Resources to Learn a Fact: Break the fact up into two parts: the parity of the number, and whether the numer is greater than $2$ . Imagine an agent that is in an epistemic state such that it could think for a while and learn either of these bits, but cannot learn both in time for when it has to take an action. The agent can depend its policy on the parity or the size but not both. Interestingly, this agent has strictly more options than an agent that only knows the parity, but technically does not fully know the parity. This is because adding more options can take away the closure property on the set of policies.

Other Subsets of the Function Space: One could imagine for example starting with an agent that knows the fact, but specifying one specific policy that the agent is not allowed to use. It is hard to imagine this as an epistemic state of the agent, but things like this might be necessary to talk about self reference.

Continuous/Computable Functions: This does not fit with the above example, but we could also restrict the space of policies to e.g. computable or continuous function of the environment, which can be viewed as a type of partial knowledge.

Confusing Parts

I don't know what the coulds are. It is annoying that our definition of knowledge is tied up with something as confusing as free will. I have a suspicion, however, that this is necessary. I suspect that our trouble with understanding naturalized world models might be coming from trying to understand them on their own, when really they have a complicated relationship with decision theory.

I do not yet have any kind of a picture that unifies this with the other epistemic primitives, like probability and proof, and I expect that this would be a useful thing to try to get.

It is interesting that one way of thinking about what the coulds are is related to the agent being uncertain. In this model, the fact that the agent could take different actions is connected to the agent not knowing what action it takes, which interestingly matches up with the fact in this model, if an agent could take multiple actions, it can't know which one it takes.

It seems like an agent could effectively lose knowledge by making precommitments not to follow certain policies. Normal kinds of precommitments like "if you do $X$ , I will do $Y$ " do not cause the agent to lose knowledge, but the fact it can in theory is weird. Also, it is weird that an agent that can only take one action vacuously knows all things.

It seems like to talk about knowing what you know, you run into some size problems. If the thing you know is treated as a variable that can take different values, that variable lives in the space of subsets of functions from environments to actions, $2^{A^{E}}$ which is much larger than $E$ . I think to talk about this you have to start out restricting to some subset of functions from the beginning, or some subset of possible knowledge states.

[-]Stuart Armstrong6y10

Also, it is weird that an agent that can only take one action vacuously knows all things.

You could want to define not "agent A knows fact $F$ ", but "agent A can counterfactually demonstrate that it knows fact $F$ ". So the agent with a single action can't demonstrate anything.

All that we'd need to add to the definition is the fact that there exists policies in $P$ that distinguish elements of $F$ , ie that for all $f_{i}, f_{j} \in F$ , with $f_{i} \neq f_{j}$ , there exists $e_{i} \in ϕ (f_{i})$ and $e_{j} \in ϕ (f_{j})$ and a $p \in P$ with $p (e_{i}) \neq p (e_{j})$ .

AI ALIGNMENT FORUM
AF

14

Knowledge is Freedom

14

Self Reference

Partial Knowledge

Confusing Parts