(A -> B) -> A in Causal DAGs

There is a paper which I believe is trying to do something similar to what you are attempting here:

Networks of Influence Diagrams: A Formalism for Representing Agents’ Beliefs and Decision-Making Processes, Gal and Pfeffer, Journal of Artificial Intelligence Research 33 (2008) 109-147

Are you aware of it? How do you think their ideas relate to yours?

[-]johnswentworth6y40

Very interesting, thank you for the link!

Main difference between what they're doing and what I'm doing: they're using explicit utility & maximization nodes; I'm not. It may be that this doesn't actually matter. The representation I'm using certainly allows for utility maximization - a node downstream of a cloud can just be a maximizer for some utility on the nodes of the cloud-model. The converse question is less obvious: can any node downstream of a cloud be represented by a utility maximizer (with a very artificial "utility")? I'll probably play around with that a bit; if it works, I'd be able to re-use the equivalence results in that paper. If it doesn't work, then that would demonstrate a clear qualitative difference between "goal-directed" behavior and arbitrary behavior in these sorts of systems, which would in turn be useful for alignment - it would show a broad class of problems where utility functions do constrain.

[-]tom4everitt6y10

Glad you liked it.

Another thing you might find useful is Dennett's discussion of what an agent is (see first few chapters of Bacteria to Bach). Basically, he argues that an agent is something we ascribe beliefs and goals to. If he's right, then an agent should basically always have a utility function.

Your post focuses on the belief part, which is perhaps the more interesting aspect when thinking about strange loops and similar.

[-]Steven Byrnes6y30

Why aren't you notationally distinguishing between "actual model" versus "what the agent believes the model to be"? Or are you and I missed it?

[-]johnswentworth6y20

On reflection, there's a better answer to this than I originally gave, so I'm trying again.

"What the agent believes the model to be" is whatever's inside the cloud in the high-level model. That's precisely what the clouds mean. But the clouds (and their contents) only exist in the high-level model; the low-level model contains no clouds. The "actual model" is the low-level model.

So, when we talk about the extent to which the high-level and low-level models match - i.e. what queries on the low-level model can be answered by queries on the high-level model - we're implicitly talking about the extent to which the agent's model matches the low-level model.

The high-level model (at least the part of it within the cloud) is "what the agent believes the model to be".

[-]johnswentworth6y*00

EDIT: This answer isn't very good, see my other one.

Good question. We could easily draw a diagram in which the two are separate - we'd have the "agent" node reading from one cloud and then influencing things outside of that cloud. But that case isn't very interesting - most of what we call "agenty" behavior, and especially the diagonalization issues, are about the case where the actual model and the agent's beliefs coincide. In particular, if we're talking about ideal game-theoretic agents, we usually assume that both the rules of the game and each agent's strategy are common knowledge - including off-equilibrium behavior.

So, for idealized game-theoretic agents, there is no separation between the actual model and the agent's model - interventions on the actual model are reflected in the agent's model.

That said, in the low-level model, the map and the territory will presumably always be separate. "When do they coincide?" is implicitly wrapped up in the question "when do non-agenty models abstract into agenty models?". I view the potential mismatch between the two models as an abstraction failure - if they don't match, then the agency-abstraction is broken.

Modifying M

If A is determined by a computation on the model

M

, then

M

is causally upstream of A. That means that, if we change

M

- e.g. by an intervention

M \leftarrow d o (B = 2, M)

- then A should change accordingly.

Let’s look at a concrete example.

We’ll stick with our (A -> B) -> A system. Let’s say that A is an investment - our agent can invest anywhere from $0 to $1. B is the payout of the investment (which of course depends on the investment amount). The “inner” model

M = “ P [B | A, M] = f_{B} (B, A) ”

describes how B depends on A.

We want to compare two different models within this setup:

A chosen to maximize some expected function of net gains, based on

M

A is just a plain old root node with some value (which just so happens to maximize expected net gains for the

M

we're using)

What predictions would the two make differently?

Well, the main difference is what happens if we change the model

M

, e.g. by intervening on B. If we intervene on B - i.e. fix the payout at some particular value - then the “plain old root node” model predicts that investment A will stay the same. But the strange loop model predicts that A will change - after all, the payout no longer depends on the investment, so our agent can just choose not to invest at all and still get the same payout.

In game-theoretic terms: agenty models and non-agenty models differ only in predictions about off-equilibrium (a.k.a. interventional/counterfactual) behavior.

Practically speaking, the cleanest way to represent this is not as a Bayes net, but as a set of structural equations. Then we’d have:

M = “ P [U_{i} = u | M] = I [0 \leq u < 1] d u A = f_{A} (M, U_{A}) B = f_{B} (A, U_{B}) ”

However, this makes the key point a bit tougher to see: the main feature which makes the system “agenty” is that M appears explicitly as an argument to a function, not just as prior information in probability expressions.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

23

23

Modifying M