Natural Latents: The Concepts

David Lorell

I'm possibly missing something basic here, but: how is the redund/latent-focused natural-abstraction theory supposed to deal with synergistic information (and "emergent" dynamics)?

Consider a dog at the level of atoms. It's not, actually, the case that "this is a dog" is redundantly encoded in each atom. Even if each atom were clearly labeled, and we had an explicit approximately deterministic function, the state of any individual atom would constrain the output not at all. Atom#2354 being in a state #7532 is consistent with its comprising either a dog, or a cat, or an elephant...

This only stops applying if we consider macroscopically sized chunks of atoms, or the specific set of microscopically sized chunks corresponding to DNA.

And even that doesn't always work. Consider a precision-engineered nanomachine, with each atom accounted for. Intuitively, "the nanomachine's state" should be an abstraction over those atoms. However, there's not necessarily any comparatively miniscule "chunk" of the nanomachine that actually redundantly encodes its state! E. g., a given exact position of appendage#12 may be consistent either with resource-extraction or with rapid travel.

So: Suppose we have some set of random variables $X$ representing some cube of voxels where each voxel reports what atoms are in it. Imagine a dataset of various animals (or nanomachines) in this format, of various breeds and in various positions.

"This is a dog" tells us some information about $X$ : $H (X | dog) < H (X)$ . Indeed, it tells us a fairly rich amount of information: the general "shape" of what we should expect to see there. However, for any individual $X_{i}$ , $H (X_{i} | dog) \approx H (X_{i})$ .^[1] Which is to say: "this is a dog" is synergistic information about $X$ ! Not redundant information. And symmetrically, sampling a given small chunk of $X$ won't necessarily tell us whether it's the snapshot of a dog or a cat (unless we happen to sample a DNA fragment). $H (animal | X) = 0$ , but $H (animal | X_{i}) \approx H (animal)$ .

One way around this is to suggest that cats/dogs/nanomachines aren't abstractions over their constituent parts, but abstractions over the resampling of all their constituent parts under state transitions. I. e., suppose we now have 3D video recordings: then "this is a dog" is redundantly encoded in each $X (t)$ for $t \in [t_{start}, t_{end}]$ .

But that seems counterintuitive/underambitious. Intuitively, tons of abstractions are about robust synergistic information/emergent dynamics.

Is there some obvious way around all that, or it's currently an open question?

^{^}
Though it's not literally zero. E. g., if we have a fixed-size voxel cube, then depending on whether it's a dog or an elephant, we should expect the voxels at the edges to be more or less likely to contain air vs. flesh.

[-]johnswentworth6mo30

Yeah, this is an active topic for us right now.

For most day-to-day abstraction, full strong redundancy isn't the right condition to use; as you say, I can't tell a dog by looking at each individual atom. But full weak redundancy goes too far in the opposite direction: I can drop a lot more than just one atom and still recognize the dog.

Intuitively, it feels like there should be some condition like "if you can recognize a dog from most random subsets of the atoms of size 2% of the total, then P[X|latent] factors according to <some nice form> to within <some error which gets better as the 2% number gets smaller>". But the naive operationalization doesn't work, because we can use xor tricks to encode a bunch of information in such a way that any 2% of (some large set of variables) can recover the info, but any one variable (or set of size less than 2%) has exactly zero info. The catch is that such a construction requires the individual variables to be absolutely enormous, like exponentially large amounts of entropy. So maybe if we assume some reasonable bound on the size of the variables, then the desired claim could be recovered.

[-]Thane Ruthenis6mo*21

Cool. I've had the same idea, that we want something like "synergistic information present in each random subset of the system's constituents", and yeah, it doesn't work out-of-the-box.

Some other issues there:

If we're actually sampling random individual atoms all around the dog's body, it seems to me that we'd need an incredibly large amount of them to decode anything useful. Much fewer than if we were sampling random small connected chunks of atoms.
- More intuitive example: Suppose we want to infer a book's topic. What's the smallest such that we can likely infer the topic from a random string of length $N$ ? Comparatively, what's the smallest $M$ such that we can infer it from $M$ letters randomly and independently sampled from the book's text? It seems to me that $N ≪ M$ .
But introducing "chunks of nearby variables" requires figuring out what "nearby" is, i. e., defining some topology for the low-level representation. How does that work?
Further, the size of the chunk needed depends a lot on which part of the system we sample, so just going "a flat % of all constituents" doesn't work. Consider happening to land on a DNA string vs. some random part of the interior of the dog's stomach.
- Actually, dogs are kind of a bad example, animals do have DNA signatures spread all around them. A complex robot, then. If we have a diverse variety of robots, inferring the specific type is easy if we sample e. g. part of the hardware implementing its long-term memory, but not if we sample a random part of an appendage.
- Or a random passage from the book vs. the titles of the book's chapters. Or even just "a sample of a particularly info-dense paragraph" vs. "a sample from an unrelated anecdote from the author's life". % of the total letter count just doesn't seem like the right notion of "smallness".
On the flip side, sometimes it's reversed: sometimes we do want to sample random unconnected atoms. E. g., the nanomachine example: if we happen to sample the "chunk" corresponding to appendage#12, we risk learning nothing about the high-level state, whereas if we sample three random atoms from different parts of it, that might determine the high-level state uniquely. So now the desired topology of the samples is different: we want non-connected chunks.

I'm currently thinking this is solved by abstraction hierarchies. Like, maybe the basic definition of an abstraction is of the "redundant synergistic variable" type, and the lowest-level abstractions are defined over the lowest-level elements (molecules over atoms). But then higher-level abstractions are redundant-synergistic over lower-level abstractions (rather than actual lowest-level elements), and up it goes. The definitions of the lower-level abstractions provide the topology + sizing + symmetries, which higher-level abstractions then hook up to. (Note that this forces us to actually step through the levels, either bottom-up or top-down.)

As examples:

The states of the nanomachines' modules are inferable from any subset of the modules' constituent atoms, and the state of the nanomachine itself is inferable from the states of any subset of the modules. But there's no such neat relationships between atoms and the high-level state.
"A carbon atom" is synergistic information about a chunk of voxels (baking-in how that chunk could vary, e. g. rotations, spatial translations); "a DNA molecule" is synergistic information about a bunch of atoms (likewise defining custom symmetries under which atom-compositions still count as a DNA molecule); "skin tissue" is synergistic over molecules; and somewhere up there we have "a dog" synergistic over custom-defined animal-parts.

Or something vaguely like that; this doesn't exactly work either. I'll have more to say about this once I finish distilling my notes for external consumption instead of expanding them, which is going to happen any... day... now...

^{^}

We also sometimes call the mediation condition the “independence” condition.

^{^}

Natural Latents: The Math called the redundancy condition the “insensitivity” condition instead; we also sometimes call it the “invariance” condition.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

40

Natural Latents: The Concepts

40

What Are Natural Latents? How Do We Quickly Check Whether Something Is A Natural Latent?

Alice & Bob’s Art Project

Generalization

Dogs

Why Are Natural Latents Useful?

Minimal Relevant Information

Maximal Robust Information

More Examples

Toy Probability Examples

Anti-Example: Three Flips Of A Biased Coin

1000 Flips Of A Biased Coin

Ising Model

Physics-Flavored Examples

Gas (Over Space)

Non-Isolated Gas

Gasses In Systems

Rigid Bodies

Phase Change

Other Examples

“Clusters In Thingspace”

Social Constructs: Laws

Takeaways