Planned summary for the Alignment Newsletter:
Probability theory can tell us about how we ought to build agents that have knowledge (start with a prior, and perform Bayesian updates as evidence comes in). However, this is not the only way to create knowledge: for example, humans are not ideal Bayesian reasoners. As part of our quest to <@_describe_ existing agents@>(@Theory of Ideal Agents, or of Existing Agents?@), could we have a theory of knowledge that specifies when a particular physical region within a closed system is “creating knowledge”? We want a theory that <@works in the Game of Life@>(@Agency in Conway’s Game of Life@) as well as the real world.
This sequence investigates this question from the perspective of defining the accumulation of knowledge as increasing correspondence between [a map and the territory](https://en.wikipedia.org/wiki/Map%E2%80%93territory_relation), and concludes that such definitions are not tenable. In particular, it considers four possibilities, and demonstrates counterexamples to all of them:
1. Direct map-territory resemblance: Here, we say that knowledge accumulates in a physical region of space (the “map”) if that region of space looks more like the full system (the “territory”) over time.
Problem: This definition fails to account for cases of knowledge where the map is represented in a very different way that doesn’t resemble the territory, such as when a map is represented by a sequence of zeros and ones in a computer.
2. Map-territory mutual information: Instead of looking at direct resemblance, we can ask whether there is increasing mutual information between the supposed map and the territory it is meant to represent.
Problem: In the real world, nearly _every_ region of space will have high mutual information with the rest of the world. For example, by this definition, a rock accumulates lots of knowledge as photons incident on its face affect the properties of specific electrons in the rock giving it lots of information.
3. Mutual information of an abstraction layer: An abstraction layer is a grouping of low-level configurations into high-level configurations such that transitions between high-level configurations are predictable without knowing the low-level configurations. For example, the zeros and ones in a computer are the high-level configurations of a digital abstraction layer over low-level physics. Knowledge accumulates in a region of space if that space has a digital abstraction layer, and the high-level configurations of the map have increasing mutual information with the low-level configurations of the territory.
Problem: A video camera that constantly records would accumulate much more knowledge by this definition than a human, even though the human is much more able to construct models and act on them.
4. Precipitation of action: The problem with our previous definitions is that they don’t require the knowledge to be _useful_. So perhaps we can instead say that knowledge is accumulating when it is being used to take action. To make this mechanistic, we say that knowledge accumulates when an entity’s actions become more fine tuned to a specific environment configuration over time. (Intuitively, they learned more about the environment, and so could condition their actions on that knowledge, which they previously could not do.)
Problem: This definition requires the knowledge to actually be used to count as knowledge. However, if someone makes a map of a coastline, but that map is never used (perhaps it is quickly destroyed), it seems wrong to say that during the map-making process knowledge was not accumulating.
I think that part of the problem is that talking about knowledge requires adopting an interpretative frame. We can only really say whether a collection of particles represents some particular knowledge from within such a frame, although it would be possible to determine the frame of minimum complexity that interprets a system as representing certain facts. In practise though, whether or not a particular piece of storage contains knowledge will depend on the interpretative frames in the environment, although we need to remember that interpretative frames can emulate other interpretative frames. ie. A human experimenting with multiple codes in order to decode a message.
Regarding the topic of partial knowledge, it seems that the importance of various facts will vary wildly from context to context and also depending on the goal. I'm somewhat skeptical that goal independent knowledge will have a nice definition.
Well yes I agree that knowledge exists with respect to a goal, but is there really no objective difference an alien artifact inscribed with deep facts about the structure of the universe and set up in such a way that it can be decoded by any intelligent species that might find it, and an ordinary chunk of rock arriving from outer space?
Well, taking the simpler case of exacting reproducing a certain string, you could find the simplest program that produces the string similar to Kolmogorov complexity and use that as a measure of complexity.
A slightly more useful way of modelling things may be to have a bunch of different strings with different points representing levels of importance. And perhaps we produce a metric combining the Kolmovorov complexity of a decoder with the sum of the points produced where points are obtained by concatenating the desired strings with a predefined separator. For example, we might find the quotient.
One immediate issue with this is that some of the strings may contain overlapping information. And we'd still have to produce a metric to assign importances to the strings. Perhaps a simpler case would be where the strings represent patterns in a stream via encoding a Turing machine with the Turing machines being able to output sets of symbols instead of just symbols representing the possible symbols at each locations. And the amount of points they provide would be equal to how much of the stream it allows you to predict. (This would still require producing a representation of the universe where the amount of the stream predicted would be roughly equivalent to how useful the predictions are).
Any thoughts on this general approach?