AI ALIGNMENT FORUM
AF

Abstraction 2020
Abstraction
Frontpage

23

(A -> B) -> A in Causal DAGs

by johnswentworth
22nd Jan 2020
3 min read
11

23

Abstraction
Frontpage
Previous:
Logical Representation of Causal Models
No comments37 karma
Next:
Formulating Reductive Agency in Causal Models
No comments33 karma
Log in to save where you left off
(A -> B) -> A in Causal DAGs
4tom4everitt
4johnswentworth
1tom4everitt
3Steven Byrnes
2johnswentworth
0johnswentworth
New Comment
6 comments, sorted by
top scoring
Click to highlight new comments since: Today at 9:15 PM
[-]tom4everitt6y40

There is a paper which I believe is trying to do something similar to what you are attempting here:

Networks of Influence Diagrams: A Formalism for Representing Agents’ Beliefs and Decision-Making Processes, Gal and Pfeffer, Journal of Artificial Intelligence Research 33 (2008) 109-147

Are you aware of it? How do you think their ideas relate to yours?

Reply
[-]johnswentworth6y40

Very interesting, thank you for the link!

Main difference between what they're doing and what I'm doing: they're using explicit utility & maximization nodes; I'm not. It may be that this doesn't actually matter. The representation I'm using certainly allows for utility maximization - a node downstream of a cloud can just be a maximizer for some utility on the nodes of the cloud-model. The converse question is less obvious: can any node downstream of a cloud be represented by a utility maximizer (with a very artificial "utility")? I'll probably play around with that a bit; if it works, I'd be able to re-use the equivalence results in that paper. If it doesn't work, then that would demonstrate a clear qualitative difference between "goal-directed" behavior and arbitrary behavior in these sorts of systems, which would in turn be useful for alignment - it would show a broad class of problems where utility functions do constrain.

Reply
[-]tom4everitt6y10

Glad you liked it.

Another thing you might find useful is Dennett's discussion of what an agent is (see first few chapters of Bacteria to Bach). Basically, he argues that an agent is something we ascribe beliefs and goals to. If he's right, then an agent should basically always have a utility function.

Your post focuses on the belief part, which is perhaps the more interesting aspect when thinking about strange loops and similar.

Reply
[-]Steven Byrnes6y30

Why aren't you notationally distinguishing between "actual model" versus "what the agent believes the model to be"? Or are you and I missed it?

Reply
[-]johnswentworth6y20

On reflection, there's a better answer to this than I originally gave, so I'm trying again.

"What the agent believes the model to be" is whatever's inside the cloud in the high-level model. That's precisely what the clouds mean. But the clouds (and their contents) only exist in the high-level model; the low-level model contains no clouds. The "actual model" is the low-level model.

So, when we talk about the extent to which the high-level and low-level models match - i.e. what queries on the low-level model can be answered by queries on the high-level model - we're implicitly talking about the extent to which the agent's model matches the low-level model.

The high-level model (at least the part of it within the cloud) is "what the agent believes the model to be".

Reply
[-]johnswentworth6y*00

EDIT: This answer isn't very good, see my other one.

Good question. We could easily draw a diagram in which the two are separate - we'd have the "agent" node reading from one cloud and then influencing things outside of that cloud. But that case isn't very interesting - most of what we call "agenty" behavior, and especially the diagonalization issues, are about the case where the actual model and the agent's beliefs coincide. In particular, if we're talking about ideal game-theoretic agents, we usually assume that both the rules of the game and each agent's strategy are common knowledge - including off-equilibrium behavior.

So, for idealized game-theoretic agents, there is no separation between the actual model and the agent's model - interventions on the actual model are reflected in the agent's model.

That said, in the low-level model, the map and the territory will presumably always be separate. "When do they coincide?" is implicitly wrapped up in the question "when do non-agenty models abstract into agenty models?". I view the potential mismatch between the two models as an abstraction failure - if they don't match, then the agency-abstraction is broken.

Reply
Moderation Log
More from johnswentworth
View more
Curated and popular this week
6Comments
Mentioned in
15Formulating Reductive Agency in Causal Models
6Trace: Goals and Principles

Agenty things have the type signature (A -> B) -> A. In English: agenty things have some model (A -> B) which predicts the results (B) of their own actions (A). They use that model to decide what actions to perform: (A -> B) -> A.

In the context of causal DAGs, the model (A -> B) would itself be a causal DAG model M - i.e. some Python code defining the DAG. Logically, we can represent it as:

M=“(P[A|M]=fA(A))&(P[B|A,M]=fB(B,A))”

… for some given distribution functions fA and fB.

From an outside view, the model (A -> B) causes the choice of action A. Diagrammatically, that looks something like this:

The “cloud” in this diagram has a precise meaning: it’s the model M for the DAG inside the cloud.

Note that this model does not contain any true loops - there is no loop of arrows. There’s just the Hofstaderian “strange loop”, in which node A depends on the model of later nodes, rather than on the later nodes themselves.

How would we explicitly write this model as a Bayes net?

The usual way of writing a Bayes net is something like:

P[X]=∏iP[Xi|Xpa(i)]

… but as discussed in the previous post, there’s really an implicit model M in there. Writing everything out in full, a typical Bayes net would be:

P[X|M]=∏iP[Xi|Xpa(i),M]

… with M=“∀i:P[Xi|Xpa(i),M]=fi(Xi,Xpa(i))”.

Now for the interesting part: what happens if one of the nodes is agenty, i.e. it performs some computation directly on the model? Well, calling the agenty node A, that would just be a term P[A|M]... which looks exactly like a plain old root node. The model M is implicitly an input to all nodes anyway, since it determines what computation each node performs. But surely our strange loop is not the same as the simple model A -> B? What are we missing? How does the agenty node use M differently from other nodes?

What predictions would (A -> B) -> A make which differ from A -> B?

Answer: interventions/counterfactuals.

Modifying M

If A is determined by a computation on the model M, then M is causally upstream of A. That means that, if we change M - e.g. by an intervention M←do(B=2,M) - then A should change accordingly.

Let’s look at a concrete example.

We’ll stick with our (A -> B) -> A system. Let’s say that A is an investment - our agent can invest anywhere from $0 to $1. B is the payout of the investment (which of course depends on the investment amount). The “inner” model M=“P[B|A,M]=fB(B,A)” describes how B depends on A.

We want to compare two different models within this setup:

  • A chosen to maximize some expected function of net gains, based on M
  • A is just a plain old root node with some value (which just so happens to maximize expected net gains for the M we're using)

What predictions would the two make differently?

Well, the main difference is what happens if we change the model M, e.g. by intervening on B. If we intervene on B - i.e. fix the payout at some particular value - then the “plain old root node” model predicts that investment A will stay the same. But the strange loop model predicts that A will change - after all, the payout no longer depends on the investment, so our agent can just choose not to invest at all and still get the same payout.

In game-theoretic terms: agenty models and non-agenty models differ only in predictions about off-equilibrium (a.k.a. interventional/counterfactual) behavior.

Practically speaking, the cleanest way to represent this is not as a Bayes net, but as a set of structural equations. Then we’d have:

M=“P[Ui=u|M]=I[0≤u<1]duA=fA(M,UA)B=fB(A,UB)”

However, this makes the key point a bit tougher to see: the main feature which makes the system “agenty” is that M appears explicitly as an argument to a function, not just as prior information in probability expressions.