## AI ALIGNMENT FORUMAF

Yeah, these seem right.

# 16

Imagine I have a highly detailed low-level simulation (e.g. molecular dynamics) of a garden. The initial conditions include a flower, and I would like to write some code to “point” to that particular flower. At any given time, I should be able to use this code to do things like:

• compute a bounding box around the flower
• render a picture which shows just the flower, with the background removed
• list all of the particles which are currently inside the flower

Meanwhile, it should be robust to things like:

• most of the molecules in the flower turning over on a regular basis
• the flower moving around in space and/or relative to other flowers
• the flower growing, including blooming/wilting/other large morphological change
• other flowers looking similar

That said, there’s a limit to what we can expect; our code can just return an error if e.g. the flower has died and rotted away and there is no distinguishable flower left. In short: we want this code to capture roughly the same notion of “this flower” that a human would.

We’ll allow an external user to draw a boundary around the flower in the initial conditions, just to define which object we’re talking about. But after that, our code should be able to robustly keep track of our particular flower.

How could we write that code, even in principle?

## “Why Not Just… ”

There’s a lot of obvious hackish ways to answer the question - and obvious problems/counterexamples for each of them. I’ll list a few here, since the counterexamples make good test cases for our eventual answer, and illustrate just how involved the human concept of a flower is.

• Flower = molecules inside the flower-boundary at time zero. Problem: most of the molecules comprising a flower turn over on a regular basis.
• Flower = whatever’s inside the boundary which defined the flower at time zero. Counterexample: the flower might move.
• Flower = things which look (in a rendered image) like whatever was inside the boundary at time zero. Counterexample: the flower might bloom/wilt/etc. Another counterexample: there may be other, similar-looking flowers.
• Flower = instance of a recurring pattern in the data, defined by clustering. Counterexample: there may not be any other flowers. (More generally: we can recognize “weird” objects in the world which don’t resemble anything else we’ve ever seen.)
• Flower = region of high density contiguous in space-time with our initial region. Counterexample: we can dunk the flower in a bucket of water.
• Flower = contents of lipid bilayer membranes which also contain DNA sequence roughly identical to the consensus sequence of all DNA within the initial boundary, plus anything within a few microns of those membranes. Counterexample: it’s still the same flower if we blow it up via expansion microscopy and the individual cells lyse in the process. (Also this wouldn’t generalize to non-biological objects, or even clonal organisms.)

## Drawing Abstract Object Boundaries

The general conceptual challenge here is how to define an abstract object - an object which is not an ontologically fundamental component of the world, but an abstraction on top of the low-level world.

In previous posts I’ve outlined a fairly-general definition of abstraction: far-apart components of a low-level model are independent given some high-level summary data. We imagine breaking our low-level system variables into three subsets:

• Variables X which we want to abstract
• Variables Y which are “far away” from X
• Noisy “in-between” variables Z which moderate the interaction between X and Y

The noise in Z wipes out most of the information in X, so the only information from X which is relevant to Y is some summary f(X).

(I’ve sketched this as a causal DAG for concreteness, which is how I usually visualize it.) I want to claim that this is basically the right way to think about abstraction quite generally - so it better apply to questions like “what’s an abstract object?”.

So what happens if we apply this picture directly to the flower problem?

First, we need to divide up our low-level variables into the flower (X), things far away from the flower (Y), and everything in-between (noisy Z). I’ll just sketch this as the flower itself and a box showing the boundary between “nearby” and “far away”:

Notice the timesteps in the diagram - both the flower and the box are defined over time, so we imagine the boundaries living in four-dimensional spacetime, not just at one time. (Our user-drawn boundary in the initial condition constrains the full spacetime boundary at time zero.)

Now the big question is: how do we decide where to draw the boundaries? Why draw boundaries which follow around the actual flower, rather than meandering randomly around?

Let’s think about what the high-level summary f(X) looks like for boundaries which follow the flower, compared to boundaries which start around the flower (i.e. at the user-defined initial boundary) but don’t follow it as it moves. In particular, we’ll consider what information about the initial flower (i.e. flower at time zero) needs to be included in f(X).

The “true” flower moves, but the boundaries supposedly defining the “flower” don’t follow it. What makes such boundaries “worse” than boundaries which do follow the flower?

There’s a lot of information about the initial flower which could be included in our summary f(X): the geometry of the flower’s outer surface, its color and texture, temperature at each point, mechanical stiffness at each point, internal organ structure (e.g. veins), relative position of each cell, relative position of each molecule, … Which of these need to be included in the summary data for boundaries moving with the flower, and which need to be included in the summary data for boundaries not moving with the flower?

For example: the flower’s surface geometry will have an influence on things outside the outer boundary in both cases. It will affect things like drag on air currents, trajectories of insects or raindrops, and of course the flower-image formed on the retina of anyone looking at it. So the outer surface geometry will be included in the summary f(X) in both cases. On the other hand, relative positions of cells inside the flower itself are mostly invisible from far away if the boundary follows the flower.

But if the boundary doesn’t follow the flower… then the true flower is inside the boundary at the initial time, but counts as “far away” at a later time. And the relative positions of individual cells in the true flower will mostly stay stable over time, so those relative cell positions at time zero contain lots of information about relative cell positions at time two… and since the cells at time two counts as “far away”, that means we need to include all that information in our summary f(X).

Strong correlation between low-level details (e.g. relative positions of individual cells) inside the spacetime boundary and outside. That information must be included in the high-level summary f(X).

The takeaway from this argument is: if the boundary doesn’t follow the true flower, then our high-level summary f(X) must contain far more information. Specifically, it has to include tons of information about the low-level internal structure of the flower. On the other hand, as long as the true flower remains inside the inner boundary, information about that low-level structure will mostly not propagate outside the outer boundary - such fine-grained detail will usually be wiped out by the noisy variables “nearby” the flower.

This suggests a formalizable approach: the “true flower” is defined by a boundary which is locally-minimal with respect to the summary data f(X) required to capture all its mutual information with “far-away” variables.

## Test Cases

Before we start really attacking this approach, let’s revisit the problems/counterexamples from the hackish approaches:

• Molecular turnover: not a problem. The relevant information does not follow the individual molecules.
• Flower might move: not a problem. We basically discussed that directly in the previous section.
• Flower might bloom/wilt/etc: not a problem. Mutual information still follows the same pattern, although note that once the flower rots away altogether, we can draw a time-boundary indicating that the flower no longer exists, and indeed we expect everything significantly after that in time to be roughly independent of our former flower.
• Similar-looking flowers: not a problem. We’re explicitly relying on the low-level internal structure to define the flower boundary.
• No other flowers: not a problem. We’re not relying on clustering or any other data from other flowers.
• Dunk flower in a bucket of water: not a problem. Noisy water molecules “nearby” the flower will wipe out low-level detailed information about as well as noisy air molecules, if not better.
• Expansion microscopy: not a problem. The information in the flower’s low-level structure sticks around in its expanded form. Indeed, expansion microscopy wouldn’t be very useful otherwise.

Main takeaway: this approach is mainly about information contained in the low-level structure of the flower (i.e. cells, organs, etc). Physical interactions which maintain that low-level structure will generally maintain the flower-boundary - and a physical interaction which destroys most of a flower’s low-level structure is generally something we’d interpret as destroying the flower.

## Problems

Let’s start with the obvious: though it’s formalizable, this isn’t exactly formalized. We don’t have an actual test-case following around a flower in-silico, and given how complicated that simulation would be, we’re unlikely to have such a test case soon. That said, next section will give a computationally simpler test-case which preserves most of the conceptual challenges of the flower problem.

First, though, let’s look at a few conceptual problems.

This approach relies on high mutual information between true-flower-at-time-zero and true-flower-at-later-times. That requires some kind of uncertainty or randomness.

There’s a lot places for that to come from:

• We could have ontologically-basic randomness, e.g. quantum noise
• We could have deterministic dynamics but random initial conditions
• More realistically, we could have some sort of observer in the system with Bayesian uncertainty about the low-level details of the world.

That last is the “obvious” answer, in some sense, and it’s a good answer for many purposes. I’m still not completely satisfied with it, though - it seems like a superintelligence with extremely precise knowledge of every molecule in a flower should still be able to use the flower-abstraction, even in a completely deterministic world.

Why/how would a “flower”-abstraction make sense under perfect determinism? What notion of locality is even present in such a system? When I probe my intuition, my main answer is: causality. I’m imagining a world without noise, but that world still has a causal structure similar to our world, and it’s that causal structure which makes the “flower” make sense.

Indeed, causal abstraction allows us to apply the ideas above directly to a deterministic world. The only change is that f(X) no longer only summarizes probabilistic information; it must also summarize any information needed to predict far-away variables under interventions (on either internal or far-away variables).

Of course, in practice, we’ll probably also want to include those interventional-information constraints even in the presence of uncertainty.

What about fine-grained information carried by, like, microwaves or something?

If we just imagine a physical outer boundary some distance from a flower (let’s say 3 meters), surely some clever physicists could figure out a way to map out the flower’s internal structure without crossing within that boundary. Isn’t information about the low-level structure constantly propagating outward via microwaves or something, without being wiped out by noisy air molecules on the way?

Two key things to keep in mind here:

• The boundary need not be a physical boundary; the “boundaries” just denote subsets of the variables of the model. If the model includes microwaves, we can just declare them all to be “nearby” the flower. Whenever they actually interact with molecules outside the flower, barring instruments specifically set up to detect them, the information they carry should be wiped out quite quickly by statistical-mechanical noise.
• In practice, we don’t just want to abstract one object. We want a whole high-level world model, full of abstract objects. The “far-away variables” will be variables within all the other high-level objects. So in order for microwaves to matter, they need to carry information from one object to another, without that information being wiped out by low-level noise.

Note that we’re talking about noise a lot here - does this problem play well with deterministic universes, where causality constrains f(X) more than plain old information? I expect the answer is yes - chaos makes low-level interventions look basically like noise for our purposes. But that’s another very hand-wavy answer.

What if we draw a boundary which follows around every individual particle which interacts with the flower?

Presumably we could get even less information in f(X) by choosing some weird boundary. The easy way to solve this is to add boundary complexity to the information contained in f(X) when judging how “good” a boundary is.

Humans seem to use a flower-abstraction without actually knowing the low-level flower-structure.

Key point: we don’t need to know the low-level flower-structure in order to use this approach. We just need to have a model of the world which says that the flower has some (potentially unknown) low-level structure, and that the low-level structure of flower-at-time-zero is highly correlated with the low-level structure of flower-at-later-times.

Indeed, when I look at a flower outside my apartment, I don’t know its low-level details. But I do expect that, for instance, the topology of the veins in that flower is roughly the same today as it was yesterday.

In fact, we can go a step further: humans lack-of-knowledge of the low-level structure of particular flowers is one of the main reasons we should expect our abstractions to look roughly like the picture above. Why? Well, let’s go back to the original picture from the definition:

Key thing to notice: since Y is independent of all the low-level details of X except the information contained in f(X), f(X) contains everything we can possibly learn about X just by looking at Y.

In terms of flowers: our “high-level summary data” f(X) contains precisely the things we can figure out about the flower without pulling out a microscope or cutting it open or otherwise getting “closer” to the flower.

## Testable Case?

Finally, let’s outline a way to test this out more rigorously.

We’d like some abstract object which we can simulate at a “low-level” at reasonable computational cost. It should exhibit some of the properties relevant to our conceptual test-cases from earlier: components which turn over, moves around, change shape/appearance, might be many or just one, etc. Just those first two properties - components which turn over and object moving around - immediately suggest a natural choice: a wave.

• In a particle view, the underlying particles comprising the wave change over time
• The wave moves around in space and relative to other waves
• The wave may change shape (due to obstacles, dissipation, nonlinearity, etc)
• There may be other similar-looking waves in the environment or no other waves

I’d be interested to hear if this sounds to people like a sensible/fair test of the concept.

## Summary

We want to define abstract objects - objects which are not ontologically fundamental components of the world, but are instead abstractions on top of a low-level world. In particular, our problem asks to track a particular flower within a molecular-level simulation of a garden. Our method should be robust to the sorts of things a human notion of a flower is robust to: molecules turning over, flower moving around, changing appearance, etc.

We can do that with a suitable notion of abstraction: we have summary data f(X) of some low-level variables X, such that f(X) contains all the information relevant to variables “far away”. We’ve argued that, if we choose X to include precisely the low-level variables which are physically inside the flower, and mostly use physical distance to define “far-away” (modulo microwaves and the like), then we’d expect the information-content of f(X) to be locally minimal. Varying our choice of X subject to the same initial conditions - i.e. moving the supposed flower-boundary away from the true flower - requires f(X) to contain more information about the low-level structure of the flower.