Trace: Goals and Principles

AI ALIGNMENT FORUM
AF

Trace: Goals and Principles — AI Alignment Forum

In terms of research, I decided to devote the month of February mainly to foundations and tools. One project was to come up with a notation/language/framework which matches the way I’ve been thinking about computation - i.e. DAGs with symmetry and “clouds” representing DAG-structure-as-data. The tool I’ve been building - a Python library tentatively called Trace - isn’t stable enough that I want to show it off yet, but I do think I’ve nailed down the core goals and principles, so it’s time to write them up.

Goals

The main thing I need for my research is a data structure suitable for automated causal abstraction algorithms on arbitrary computations. Some subgoals:

Universality: data structure should be able to represent any computation performed by a program
Data structure needs to be finite, which means leveraging symmetry to represent infinite computational DAGs
Computations must be straightforward both for a human to specify directly and for an algorithm to manipulate
Want to be able to do causal-DAG-like things, like query for parents/children of a node or perform interventions/counterfactuals along the lines of Pearl’s do()
Need to handle DAGs with dynamic structure
Eventually I’ll want to do reflective things with this data structure (i.e. self-modeling agents), so simplicity matters in the specification and core algorithms.

I’ll give a bit more detail...

The first two subgoals basically amount to “I need a data structure representing the computation performed by an arbitrary program” - i.e. the trace of an arbitrary program. I do need to actually use the data structure, so it needs to be finite. (We could use a lazy infinite structure, but then I want to know what finite data structure is used to actually represent the infinite data structure.) The computational DAGs of programs are usually infinite but symmetric, so the solution probably needs to leverage symmetry in the representation.

I (a human) need to be able to specify computations - so I either need an interface which would basically be a programming language, or I need to transpile from an existing programming language. Ideally the human-facing representation would directly reflect the computer-facing data structure, which weighs against transpilation from an existing language. Also, existing languages have way more bells and whistles than I really want to deal with for this project, even setting aside the likely importance of simplicity for reflection.

I want to write algorithms which take in one computation, chew on it, and return another computation - that’s the type signature of causal abstraction. To support those algorithms, I want the data structure to allow DAG-like things. I want a data structure which makes it easy to ask “what would have changed in this computation if the internal variables x, y, and z had been different?” - without needing to specify ahead of time which variables x, y, and z we might potentially want to fiddle with. I want to fiddle with any/all of them, after the fact.

I need a data structure which handles dynamic structure. This means both dynamically-generated computations (i.e. programs which write programs and then run them) and algorithms which make decisions based on the structure of some other computation (i.e. an algebraic optimization algorithm). These capabilities are the “clouds” needed to formulate reductive agents in computational DAGs.

I’ve also realized that dynamic structure is very common in day-to-day programming - it happens every time we hit a conditional statement. Without dynamic structure, a representation of computation has to resort to hackish special cases to handle the different dependency structures implied by branches of a conditional. Directly supporting dynamic structure avoids those hacks.

Those are my immediate needs, but I also want to support generalizations of all the above. A (simple) tool which is useful for a wide variety of applications is also likely to be more robustly useful even to my intended applications - it’s more likely to be useful longer-term as my understanding of the problems I’m working on evolves, it’s more likely to be useful in ways I didn’t anticipate, and it’s more likely to force key insights. To that end, I asked: “What can we do with computations, other than run them?”. This produced a bunch of potentially-interesting use-cases to think about:

Query computational intermediates, e.g. extracting data from a simulation or debugging a program.
Prove properties of the computation, e.g. big-O runtime or information security
Manipulate the computation algebraically, e.g. to solve equations or take limits
Manipulate the computation via interventions/counterfactuals, e.g. ask what-if questions
Make predictions about some parts of the computation based on partial information about other parts, e.g. inference
Embed the computation in some other computational model, e.g. transpilation or compilation
Message-passing or dynamic programming on the computational graph, e.g. belief propagation or backpropagation.

I’m not trying to do all of this or even most of it, but these are all use-cases in the back of my mind. In particular, belief propagation and backpropagation are nontrivial algorithms which operate directly on the computational DAG, so they’re natural use-cases/test-cases to think about. Algebraic manipulation and inference more generally are also good fits, although limited to special cases in practice.

Finally, one nongoal (though not an antigoal): language engineering. This tool (at least the human-facing interface) is sort-of-a-programming-language, but there are better people than me to worry about type systems and error handling and compiler optimizations and all that jazz. I need the tool to be usable for my research, but beyond that I’m more interested in key insights, data structures and algorithms than in engineering and language design.

Principles

The usual method for representing a DAG is to give each node a (unique) name, and then assign a list of parents to each node, in order. When the DAG represents computation (i.e. a circuit), we also include an expression on each node - one of a handful of atomic functions, e.g. addition, multiplication, comparison, boolean logic, and/or conditionals. Here’s a logic circuit for a full adder as an example:

{
    A: Expression(function=input, parents=[]),
    B: Expression(function=input, parents=[]),
    C: Expression(function=input, parents=[]),
    S1: Expression(function=xor, parents=[A, B]),
    S: Expression(function=xor, parents=[S1, C]),
    Pair1: Expression(function=and, parents=[A, B]),
    Pair2: Expression(function=and, parents=[B, C]),
    Pair3: Expression(function=and, parents=[C, A]),
    C1: Expression(function=or, parents=[Pair1, Pair2]),
    C: Expression(function=or, parents=[C1, Pair3])
}

This is (roughly) the data structure typically used for causal models, so it’s maximally convenient for causal abstraction as well as interventions/counterfactuals. I’d like to keep as much of that structure as I can.

The biggest problem with a standard DAG representation is that it doesn’t leverage symmetry. That’s inconvenient even for finite models with repeated components, but it’s a deal-breaker for infinite DAGs, where we need to leverage symmetry in order to represent the DAG finitely at all. If we’re talking about representing computation, then most of the DAGs we care about are infinite - for example, here’s a diagram of the computation associated with a factorial function:

What do we need in order to compactly represent repeated components? For starters, presumably we only want to write out the internal structure of the component once. Assume we’ll use the usual DAG representation for the internal structure: we give each node a name, then specify what it’s reading from and what function it performs. But that means that we’ll be re-using names within different contexts (i.e. copies of the component) - in other words, we have some notion of scope.

That’s principle 1: standard DAG representation + repeated components -> re-use symbol names in different contexts.

Representing Context of a Symbol

In the standard computational DAG representation, we take a symbol and directly look up its defining expression. With multiple contexts/scopes, this turns into two steps: look up the context in which the symbol appears, then find its defining expression within that context. Adding in dynamic structure, the context itself may be given as a symbol, with its own meta-context and its own defining expression. What’s the simplest data structure to support all that?

Simplest answer I have so far:

We have a Symbol type. Every Symbol has a literal and a context; the context gives an expression for the literal.
Contexts are represented just like the standard DAG representation.
Symbols have a get_value() function which gets the value of the literal within the context. For instance, Symbol(literal=’x’, context={‘x’:2}).get_value() would return 2.
The literal and/or context can also be Symbols, to support dynamic structure.

In this notation, our full adder from earlier would be written something like:

context = {}
context.update({
    A: Expression(function=input, parents=[]),
    B: Expression(function=input, parents=[]),
    C: Expression(function=input, parents=[]),
    S1: Expression(function=xor, parents=[Symbol(A, context), Symbol(B, context)]),
    S: Expression(function=xor, parents=[Symbol(S1, context), Symbol(C, context)]),
    Pair1: Expression(function=and, parents=[Symbol(A, context), Symbol(B, context)]),
    Pair2: Expression(function=and, parents=[Symbol(B, context), Symbol(C, context)]),
    Pair3: Expression(function=and, parents=[Symbol(C, context), Symbol(A, context)]),
    C1: Expression(function=or, parents=[Symbol(Pair1, context), Symbol(Pair2, context)]),
    C: Expression(function=or, parents=[Symbol(C1, context), Symbol(Pair3, 
context)]),
})

Note that the symbols all point to the context in which they appear. That makes it annoying to write out - we have to first initialize the context, then set up all the Symbols to point to it.

To make things cleaner to write, I use a Context object: it’s just like a dict, but if it contains any Symbols without an explicit context, then it points those symbols to itself. This way of writing things doesn’t change the underlying data structure at all compared to the previous example; it just makes the “code” a bit easier for humans to read/write. I also abbreviate Expression as E and Symbol as S. Combining those notational changes and making everything a bit less verbose, a computation can be written something like this:

Context({
    A: E(input, []),
    B: E(input, []),
    C: E(input, []),
    S1: E(xor, [S(A), S(B)]),
    S: E(xor, [S(S1), S(C)]),
    Pair1: E(and, [S(A), S(B)]),
    Pair2: E(and, [S(B), S(C)]),
    Pair3: E(and, [S(C), S(A)]),
    C1: E(or, [S(Pair1), S(Pair2)]),
    C: E(or, [S(C1), S(Pair3)]),
})

As the saying goes, any sufficiently complicated software project contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp. Don’t use this as a primary programming language, folks.

Representing Interventions and Functions

We’ve broken symbol resolution up into two steps: get the context, and get the value of the literal within the context. That lets us re-use symbol names within different contexts. But we still need some way to specify a function call - i.e. a copy of some context with “input” symbols set to particular values.

As an example, for a computation of factorial(3), we might want to write something like this:

Context({
    n: 3,
    is_base_case: E(==, [S(n), 0]),
    recurse_result: S(result,  <COPY OF THIS CONTEXT BUT WITH n REPLACED BY n-1>),
    non_base_result: E(*, [n, S(recurse_result)])
    result: E(ternary, [S(is_base_case), 1, S(non_base_result)])
})

Key thing to notice: “copy of this context but with n replaced by n-1” sounds an awful lot like an intervention. We take a “model” - a context - and produce a copy, with a change to the expression for one of the symbols.

That’s principle 2: function calls are produced by do()-style interventions on a context.

I’ll use the standard function call notation for this. Simple example: if we take a context like Context({x: 1, y: 2, z: E(+, [x, y])}), and apply the intervention {y: 3}, we’d write that as

Context({x: 1, y: 2, z: E(+, [x, y])})({y: 3}) = Context({x: 1, y: 3, z: E(+, [x, y])})

For our factorial example, that would look something like this:

factorial_context = Context({
    n: 3,
    is_base_case: E(==, [S(n), 0]),
    recurse_result: S(result, factorial_context({n: E(-, [S(n), 1])})),
    non_base_result: E(*, [n, S(recurse_result)])
    result: E(ternary, [S(is_base_case), 1, S(non_base_result)])
})

This is kind of tricky to write - what I intend to convey is that factorial_context is a pointer to the whole Context object. If we wanted to write the whole thing in Trace notation, it would actually look like this:

outer_context = Context({
    factorial: Context({
        n: 3,
        factorial: “this string is ignored”,
        is_base_case: E(==, [S(n), 0]),
        recurse_result: S(result, S(factorial)({n: E(-, [S(n), 1])})),
        non_base_result: E(*, [n, S(recurse_result)])
        result: E(ternary, [S(is_base_case), 1, S(non_base_result)])
    })({factorial: S(factorial)})
})

We need a pointer to the factorial context inside the factorial context itself, so we create an outer context, then use another intervention to pass our pointer in. Viewed as a programming language, Trace lacks lexical scope - annoying for a programming language, but probably a good thing for a data structure.

At this point, we have all the pieces of a programming language. To run factorial(5), we could run S(result, S(factorial, outer_context)({n:5})).get_value() with outer_context defined above.

More importantly, this immediately provides a way of accessing the program’s trace. We have, effectively, a lazy data structure representing the trace of the program.

Let’s imagine tracing our factorial function for n=3. Mixing our notation with some arrows, our program suggests something like this:

Key thing to notice: all Contexts except the first are dynamically generated, via interventions. As we trace back through the data structure, we’ll need to call get_value() to generate the contexts. So when we actually get the values of the contexts, we’ll end up with a data structure like this:

Sudden Stop

That’s it for core principles. I’m currently prototyping some causal abstraction algorithms using Trace, and I’m tweaking things as I go - in particular, I may change the intervention notation to optimize for intervening on the trace rather than intervening on the program. We’ll see.

Once I have some proper algorithms running and the notation has stabilized, I’ll probably put it on github along with some documentation.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

6

6

Goals

Principles

Sudden Stop