What is Abstraction?

I think that there are some abstractions that aren't predictively useful, but are still useful in deciding your actions.

Suppose I and my friend both have the goal of maximising the number of DNA strings whose MD5 hash is prime.

I call sequences with this property "ana" and those without this property "kata". Saying that "the DNA over there is ana" does tell me something about the world, there is an experiment that I can do to determine if this is true or false, namely sequencing it and taking the hash. The concept of "ana" isn't useful in a world where no agents care about it and no detectors have been built. If your utility function cares about the difference, it is a useful concept. If someone has connected an ana detector to the trigger of something important, then its a useful concept. If your a crime scene investigator, and all you know about the perpetrators DNA is that its ana, then finding out if Joe Blogs has ana DNA could be important. The concept of ana is useful. If you know the perpitrators entire genome, the concept stops being useful.

A general abstraction is consistent with several, but not all universe states. There are many different universe states in which the gas has a pressure of 37Pa, but also many where it isn't. So all abstractions are subsets of possible universe states. Usually, we use subsets that are suitable for reasoning about in some way.

Suppose you were literally omniscient, knowing every detail of the universe, but you had to give humans a 1Tb summary. Unable to include all the info you might want, you can only include a summery of the important points, you are now engaged in lossy compression.

Sensor data is also an abstraction, for instance you might have temperature and pressure sensors. Cameras record roughly how many photons hit them without tracking every one. So real world agents are translating one lossy approximation of the world into another without ever being able to express the whole thing explicitly.

How you do lossy compression depends on what you want. Music is compressed in a way that is specific to defects in human ears. Abstractions are much the same.

Formalization: Starting Point

To repeat the intuitive idea: an abstract model throws away or ignores information from the concrete model, but in such a way that we can still make reliable predictions about some aspects of the underlying system.

So to formalize abstraction, we first need some way to specify which "aspects of the underlying system" we wish to predict, and what form the predictions take. The obvious starting point for predictions is probability distributions. Given that our predictions are probability distributions, the natural way to specify which aspects of the system we care about is via a set of events or logic statements for which we calculate probabilities. We'll be agnostic about the exact types for now, and just call these "queries".

That leads to a rough construction. We start with some concrete model

M_{C}

and a set of queries

Q

. From these, we construct a minimal abstract model

M_{A}

by keeping exactly the information relevant to the queries, and throwing away all other information. By the minimal map theorems, we can represent

M_{A}

directly by the full set of probabilities

P [Q | M_{C}]

;

M_{A}

and

P [Q | M_{C}]

contain exactly the same information. Of course, in practical examples, the probabilities

P [Q | M_{C}]

will usually have some more compact representation, and

M_{A}

will usually contain some extraneous information as well.

To illustrate a bit, let's identify the concrete model, class of queries, and abstract model for a few of the examples from earlier.

Ideal Gas:

Concrete model $M_{C}$ is the full set of molecules, their interaction forces, and a distribution representing our knowledge about their initial configuration.
Class of queries $Q$ consists of combinations of macroscopic measurements, e.g. one query might be "pressure = 12 torr & volume = 1 m^3 & temperature = 110 K".
For an ideal gas, the abstract model $M_{A}$ can be represented by e.g. temperature, number of particles (of each type if the gas is mixed), and container volume. Given these values and assuming a near-equilibrium initial configuration distribution, we can predict the other macroscopic measurables in the queries (e.g. pressure).

Tennis:

Concrete model $M_{C}$ is the full microscopic configuration of me and the physical world around me as I play tennis (or whatever else I do).
Class of queries $Q$ is hard to sharply define at this point, but includes things like "John will answer his cell phone in the next hour", "John will hold a racket and hit a fuzzy ball in the next hour", "John will play Civ for the next hour", etc - all the things whose probabilities change on hearing that I'm going to play tennis.
Abstract model $M_{A}$ is just the sentence "I am going to play tennis".

Street Map:

Concrete model $M_{C}$ is the physical city streets
Class of queries $Q$ includes things like "shortest path from Times Square to Central Park starts by following Broadway", "distance between the Met and the Hudson is less than 1 mile", etc - all the things we can deduce from a street map.
Abstract model $M_{A}$ is the map. Note that the physical map also includes some extraneous information, e.g. the positions of all the individual atoms in the piece of paper/smartphone.

Already with the second two examples there seems to be some "cheating" going on in the model definition: we just define the query class as all the events/logic statements whose probabilities change based on the information in the map. But if we can do that, then anything can be an "abstract map" of any "concrete territory", with the queries

Q

taken to be the events/statements about the territory which the map actually has some information about - not a very useful definition!

Natural Abstractions

Intuitively, It seems like there exist "natural abstractions" - large sets of queries on a given territory which all require roughly the same information. Statistical mechanics is a good source of examples - from some macroscopic initial conditions, we can compute whatever queries we want about any macroscopic measurements later on. Note that such natural abstractions are a property of the territory - it's the concrete-level model which determines what large classes of queries can be answered with relatively little information.

For now, I'm interested primarily in abstraction of causal dags - i.e. cases in which both the concrete and abstract models are causal dags, and there is some reasonable correspondence between counterfactuals in the two. In this case, the set of queries should include counterfactuals, i.e. do() operations in Pearl's language. (This does require updating definitions/notation a bit, since our queries are no longer purely events, but it's a straightforward-if-tedious patch.) That's the main subject I'm researching in the short term: what are the abstractions which support large classes of causal counterfactuals? Expect more posts on the topic soon.

[-]Donald Hobson6y20

[-]johnswentworth6y10

How you do lossy compression depends on what you want.

I think this is technically true, but less important than it seems at first glance. Natural abstractions are a thing, which means there's instrumental convergence in abstractions - some compressed information is relevant to a far wider variety of objectives than other compressed information. Representing DNA sequences as strings of four different symbols is a natural abstraction, and it's useful for a very wide variety of goals; MD5 hashes of those strings are useful only for a relatively narrow set of goals.

Somewhat more formally... any given territory has some Kolmogorov complexity, a maximally-compressed lossless map. That's a property of the territory alone, independent of any goal. But it's still relevant to goal-specific lossy compression - it will very often be useful for lossy models to re-use the compression methods relevant to lossless compression.

For instance, maybe we have an ASCII text file which contains only alphanumeric and punctuation characters. We can losslessly compress that file using e.g. Huffman coding, which uses fewer bits for the characters which appear more often. Now we decide to move on to lossy encoding - but we can still use the compressed character representation found by Huffman, assuming the lossy method doesn't change the distribution of characters too much.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

11

11

Formalization: Starting Point

Natural Abstractions