Human priors, features and models, languages, and Solmonoff induction

AI ALIGNMENT FORUM
AF

Human priors, features and models, languages, and Solmonoff induction — AI Alignment Forum

This post was originally inspired by Robin Hanson's publication "Uncommon Priors Require Origin Disputes, but it took a left turn and brought in Solmonoff induction, Shannon entropy, and model splintering.

Priors, features, models, Shannon entropy, and computer languages

Let's weave a simplified tale about how humans might generate their priors. Humans grow up in the world, getting used to operating within it, long before they learn about Bayesian reasoning. Indeed, some humans never learn explicit Bayesian reasoning at all! Though it seems they often use Bayesian reasoning implicitly.

Let's imagine our Hero or Heroine, living their lives and functioning in society. They need a mental apparatus that allows them to cope well with what they encounter everyday, especially what's important to them.

They might have a generalised model whose features represent what they care about - and what's important for them to know. If all their friends are music fans, then, to function socially, they may need to distinguish between Ava Max and Masked Wolf. If they work in a kitchen, then it is important for them not to confuse boning knives and filleting knives. In prison, it's lethal to confuse Nuestra Familia with the Aryan Brotherhood.

But if they're neither involved nor interested in those various worlds, they can get away with grouping those pairs together as "modern singers", "kitchen knives", and "prison gangs". So people's mental models will contain features that matter to them - actionable features. It's vitally important for all life on Earth the the sun not go red giant tomorrow; however, since there's very little anyone could do about it if it did, there's no need for most people to know anything about red giants or the life cycle of stars. This leads to the famous New Yorker cover, which not only encodes how many New Yorkers do see the world, but also how it is useful for New Yorkers to see the world:

We can think of this in terms of Shannon entropy and compressed communication. People need to be able to swiftly call to mind any key features about their situation, and rapidly combine this with other knowledge to reach a decision - such as when their hand is reaching into the knife drawer. So, roughly at least, their minds will encode important features about their lives and the relationships between these features, with ease of retrieval being important.

This is an ease-of-retrieval version of optimal coding. In optimal coding, symbols that appear often are represented by short sequences of symbols; symbols that appear rarely are represented by long sequences of symbols. In mental models, features that are often needed are easy to retrieve and have a rich library of mental connection to other related symbols. That last fact means that features that often appear together are easy to retrieve together. Conversely, features that are rarely needed are less easy to retrieve, and less easy to connect to other features^[1].

Computer languages and priors

When someone encounters new situations, the first thing they do is to try and connect it to known ideas or known categories. "So, the second world war was like when the guy from the Aryan Brotherhood insulted the girlfriend of that other guy in Nuestra Familia". It might be a poor analogy, in which case they will reach for more obscure features or add more details and caveats to the initial analogy.

This means that their mental features function as a language to describe or model novel situations (model splintering happens if they encounter these new situations often enough, and adjust their mental features to easily model them). Those who master mathematical features have the advantage of being able to model more general situations with features they understand.

In this way, mental features can be seen as the basis of a language to model and predict general situations. The more exotic these situations are, the more complex the description is, in that it has to involve rare features or multiple caveats.

We can see this as a computer language for Solmonoff induction, that (rough) measure of complexity being the "length" of the program that models the situation. In Solmonoff induction, this corresponds to a prior, which behaves the way we would want: unusual or unimportant situations are seen as less likely, while common and important situations have much higher prior probability. Model splintering happens when someone has updated on enough unusual sightings that it is worth their while to change their "language".

To summarise, here is a rough correspondence between the (bounded) Bayesian, mental model, and Solmonoff induction approaches:

Agreeing to disagree over priors

Let's go back to Robin Hanson's point, that rational humans shouldn't disagree on priors unless there is specific source of disagreement (an origin dispute). I agree with this, though I'll present a version that's less formal and maybe more intuitive.

So, say a fully rational music fan encounters a fully rational prisoner, and they realise they have very different priors/mental models. If they get involved in long discussions, they will realise their key concepts are very different, maybe even their whole worldviews - the music fan might think that people are generically nice but can be catty and flaky, the prisoner might believe that lying and betrayal is common but that reputation is reliable.

If they are fully rational, they would realise that these differences come from different experiences of the world, and serve different purposes. If the music fan is sent to prison, they would want the worldview of the prisoner, and vice versa if the prisoner is released and wants to learn how to socialise in a group of music fans.

If they are also aware of evolution and human biases, they'd also realise that human mental models come pre-loaded with linguistic and social concepts, and realise that that this means that a) these concepts are likely very useful for interacting with other humans (or else evolution wouldn't have selected for them), but b) these are not reliable priors for dealing with situations outside of social interactions. So they could deduce that the world runs on mathematics, not stories, and Maxwell's equations are simpler than Thor. And then they would use their social knowledge to craft a blog post on that subject that would interest other human beings.

For example, having looked them us, I now have at least vague categories for Ava Max, Masked Wolf, boning knives, filleting knives, Nuestra Familia, and the Aryan Brotherhood. But I know almost nothing about them, and would have difficulty bringing them to mind in contexts other than this post. ↩︎

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

7

Human priors, features and models, languages, and Solmonoff induction

7

Priors, features, models, Shannon entropy, and computer languages

Computer languages and priors

Agreeing to disagree over priors