tl;dr: I argue that the symbol grounding problem is not an abstract philosophical problem, but a practical issue that it is useful and necessary for algorithms to solve, empirically, to at least some extent.

Intro: the Linear A encyclopaedia

Consider , a detailed encyclopaedia of the world:

The inside pages are written, for some inexplicable reason, in Linear A, an untranslated Minoan script:

We also have access to , a sort of Linear A Rosetta Stone, with the same texts in Linear A and in English. Just like the original Rosetta Stone, this is sufficient to translate the language.

The amount of information in is much higher than the very brief . And yet on its own is pretty useless, while together unlock a huge amount of useful information: a full encyclopaedia of the world. What's going on here?

Grounded information

What's going on is that contains a large amount of grounded information: information about the actual world. But when the meaning of Linear A was lost that grounding was lost too: it became just a series of symbols with a certain pattern to it. Its entropy (in the information-theory sense) did not change; but now it became essentially a random sequence with that particular entropy and pattern.

Adding in restores that grounding, and so transforms back into a collection of useful information about the world; again, without changing its information content in the entropy sense.

This is the symbol grounding problem, not seen as an isosteric philosophical issue, but a practical learning problem.

Writing the encyclopaedia

To put this in symbols, let be the world, let be a set of features in the world, and be a probability distribution over these features (see this post for details of the terminology).

An agent has their own feature set , in Linear A, which corresponds to the features of the world. The correspondence is given by the map . The agent also knows .

That then sits down to write the encyclopaedia ; in it, they put a lot of their knowledge of , using the terms in to do so.

Thus any agent that knows , the grounding of 's symbols, can use to figure out a lot about and hence about the world.

What if agent doesn't know Linear A, but does know English? Then they have their own symbol grounding . The encyclopaedia is almost useless to them. It still has the same amount of information, but there's no way of translating that information into , the features that uses.

Now the role of is clear: by translating English and Linear A into each other, it defines an equivalence between , the features of , and , the features of (specifically, it defines ).

Given this, now knows . And, assuming they trust , they can extract information on from the encyclopaedia . Hence they can get some of 's real-world information.

Ungrounded information

Now consider , a text with the same amount of information to , but generated by a random process. There is information about in , but its not real-world-information-about--mediated-through-; it's just random information.

Maybe there could be a that grounds the symbols of in a way that makes sense for the real world. But, since is random, the would have to be huge: essentially would have to contain all the real world information itself.

So we can say that both and are ungrounded for agent ; but is at least potentially grounded, in that there is a short translation that will ground it. This "short translation", , only works because was constructed by , a grounded agent (by ), and 's features are also grounded in the same world (by ). Thus all that does is translate between grounded symbols.

Syntax versus semantics

Note that looks like pure syntax, not semantics. It doesn't define any features; instead it gives a (probabilistic) measure of how various features relate to each other. Thus only gives syntactic information, it seems.

But, as I noted in previous posts, the boundary between syntax and semantics is ambiguous.

Suppose that an agent has a feature "gavagai", another feature "snark", and let be a term in their syntax, corresponding to the probability of "snark", given "gavagai".

Suppose we are starting to suspect that, semantically[1], "gavagai" means either rabbit or cooked meal, while "snark" means fur.

Then if , "gavagai" is likely to be rabbit, since whenever gavagai is present, there is fur. Conversely, if , then "gavagai" is likely to be a meal, since gavagai means no fur around. Re-conversely, if gavagai is rabbit, is likely to be close to .

None of these interpretations are definite. But the probabilities of particular syntactic and semantic interpretations go up and down in relation to each other. So purely syntactical information provides evidence for symbol grounding - because this syntax has evolved via interactions with the real world. In fact, , and even , all evolve through interactions with the real world, and can change when new information arrives or the model features splinter[2].

And syntax and semantics can also substitute for each other, to some extent. Suppose has only ever seen white rabbits. Then maybe "gavagai" means "white rabbit", a semantic statement. Or maybe "gavagai" means "rabbit", while also believes the syntactic statement "all rabbits are white". Either option would result in the same behaviour and pretty much the same ways of thinking, at least in circumstances has already encountered.

You could see semantics, in humans, as instinctively recognising certain features of the world, and syntax as logically putting together these features to derive other features (using rules and life experience learnt in the past). Putting it this way, it's clear why the line between syntax and semantics is vague, and can vary with time and experience[3].

We also often only have implicit access to our own and our own , further complicating the distinction.

Doing without "the world"

In the post on model splintering, I argued for doing without underlying "worlds", and just using the imperfect features and models of the agents.

This is needed in this setting, too. Agent might make use of the feature "human being", but that doesn't mean that the world itself needs to have a well-defined concept of "human being". It's very hard to unambiguously define "human being" in terms of physics, and any definition would be debatable. Moreover, did not use advanced physics in their own definition.

For symbol translation, though, it suffices that understands roughly what means by "human beings" - neither needs a perfect definition.

Understanding the features of another agent

So, let's change the definitions a bit. Agent has their features and their probability distribution over features, (the syntax); the same goes for agent with and .

The grounding operators and don't map from the features of the world, but from the private input-output histories of the two agents, and . Write to designate that is the probability that feature is equal to , given the history and the grounding function[4] .

Now, what about ? They will have their assessment of 's internal symbols; let be 's interpretation of 's symbol .

They can compute expressions like , the probability assigns to 's feature being equal to . It will likely reach this probability assignment by using , and : these are 's assessment of 's grounding, syntax, and history. Summing over these gives:

Then will consider feature to be perfectly grounded if there exists , a feature in , such that for all histories ,

Thus, 's assessment of (according to 's estimate) is always perfectly aligned with 's assessment of .

Imperfect grounding: messy, empirical practice

Perfect grounding is far too strong a condition, in general. It means that believes that 's estimate of is always better than its own estimate of .

Suppose that and referred to the number of people in a given room. Sometimes is in the room, and sometimes is.

Obviously it makes sense for to strongly trust 's estimate when is in the room and is not; conversely, it would (generally) be wrong for to do so when their positions are reversed.

So sometimes will consider 's estimates to be well grounded and correct[^nodistinct]; sometimes it won't. Grounding, in practice, involves modelling what the other person knows and doesn't know. We might have a some idea as to what would lead astray - eg a very convincing human-like robot in the room, combined with poor lighting.

That sounds both very complicated (if we wanted to formalise that in terms of what agent believes about what agent believes about features...) and very simple (it's exactly what human theory of mind is all about).

What this means is that, given this framework, figuring out the grounding of the symbols of another agent is a messy empirical process. It's not a question of resolving esoteric philosophical questions, but assessing the knowledge, honesty, accuracy, and so on, of another agent.

Note the empiricism here; we've moved from:

  • Are the symbols of well grounded?


  • What are the symbols of grounded as, and in what contexts?

Language, and interpretation of the symbols of others

Those happy few who know my research well may be puzzled at this juncture. Knowing the internal features of another agent is the ultimate structured white-box model - we not only know how the other agent thinks, but we assign useful labels to their internal processes. But when I introduced that concept, I argued that you could not get structured white-box models without making major assumptions. So what's going on here?

First of all, note that the definition of grounding of is given entirely in terms of 's estimate of 's features. In the limit, could be utterly wrong about everything about 's internals, and still find that 's symbols are grounded, even perfectly grounded.

That's because is not using 's symbols at all, but their own estimate of 's symbols. Maybe always feels fear and disgust when they get cold. Assume that that is the only time that feels both those sentiments at once. Then might have two symbols, "fear" and "disgust", while might model "fear-plus-disgust" as the single feature "cold". And then can use that feature, empirically and correctly, to predict the temperature. So thinks that 's feature "cold" is well grounded, even if the feature doesn't even exist in 's models. This is the sense in which "gavagai" meaning either "rabbit" or "undetached rabbit-part" - actually mean the same thing.

But for humans, there are other factors at work. Humans communicate, using words to designate useful concepts, and converge on mutually intelligible interpretations of those words, at least in common situations. A phrase like "Run, tiger!" needs to be clearly understood immediately.

Now, we humans:

  1. Partially understand each other thanks to our theory of mind, and
  2. Use the words we communicate with in order to define internal concepts and features.

This means that we will converge on roughly shared understandings of what words mean, at least in typical environments. This rough shared understanding explains both the untranslatability of language and why it's mainly translatable. We might not get all the linguistic nuances of the Tale of Genji, but we do know that it's about Japanese courtiers in the 11th century, and not about time-travelling robots from the future.

A last note on language: we can use it to explore concepts that we've never encountered, even concepts that don't exist. This means that, with language, we might realise that, for someone, "gavagai" means "undetached rabbit part" rather than "rabbit", because we can use linguistic concepts to imagine a distinction between those two ideas. And then we can communicate this distinction to others.

GPT-n, ungrounded

This kind of reasoning causes me to suspect that the GPT-n series of algorithms will not reach super-human levels of capability. They've achieved a lot through syntactic manipulation of texts; but their symbols are almost certainly ungrounded. Consider two hypotheses:

  1. To write like a human, an agent needs a full understanding of physics, biology, and many other sciences.
  2. There are simpler models that output human-like writing, with decent probability, without modelling the hard sciences.

I think there's evidence for the second hypothesis - for example, the successes of the current and past GPT-ns. It does not seem plausible that these machines are currently modelling us from electrons upwards.

But if the second hypothesis is true, then we'll expect that the GPT-ns will reach a plateau at or near the maximal current human ability. Consider two models, (a full physics model of humanity and enough of the universe), and (a simplified model of human text generation). As long as the GPT-ns are successful with , there will be no pressure on them to develop . Pressure can mean reinforcement learning, objective functions, or humans ranking outputs or tweaking the code. For the algorithm to converge on , the following need to be true:

  1. Using is significantly better than using , so there is pressure for the algorithm to develop the better model.
  2. Moving towards , from its current model, is a sufficient improvement over , that it will find a path towards that model.

It seems to me that 1. might be true, but 2. seems very unlikely to be true. Therefore, I don't think that GPT-ns will need or be able to ground its symbols, and hence will be restricted to human-comparable levels of ability.

We could empirically test this, in fact. Feed GPT-3 all the physics papers we have until 1904[5]. Could GPT-3 or any of its successors generate special and general relativity from that data? I would be extremely surprised, and mildly terrified, if it did. Because it could only do so if it really understood, in a grounded way, what physics was.

Thanks to Rebecca Gorman for help with this research.

  1. Using to define semantics. ↩︎

  2. Like learning that "dog" and "wolf" are meaningfully different and can't be treated the same way -- or that different breeds of dogs are also different in relevant ways. In that case, the categorisation and/or modelling will shift to become more discriminatory and precise. This tends to be in areas of relevance to the human: for a dog breeder, the individual breeds of dogs are very precise and detailed categories, while everything that lives in the sea might be cheerfully dumped into the single category "fish". ↩︎

  3. See how carefully learnt deductions can become instinctive, or how we can use reason to retrain our instincts. ↩︎

  4. Notice that this formulation means that we don't need to distinguish the contribution of semantics () from that of syntax (): both are folded into the same expression. ↩︎

  5. Maybe filter out Lorentz's papers. ↩︎

New Comment
2 comments, sorted by Click to highlight new comments since:

I may write more on this later, but for now I just want to express exuberance at someone in the x-risk space thinking and writing about this :)

Express, express away _