tl;dr: I argue that the symbol grounding problem is not an abstract philosophical problem, but a practical issue that it is useful and necessary for algorithms to solve, empirically, to at least some extent.
Consider E, a detailed encyclopaedia of the world:
The inside pages are written, for some inexplicable reason, in Linear A, an untranslated Minoan script:
We also have access to R, a sort of Linear A Rosetta Stone, with the same texts in Linear A and in English. Just like the original Rosetta Stone, this is sufficient to translate the language.
The amount of information in E is much higher than the very brief R. And yet E on its own is pretty useless, while E+R together unlock a huge amount of useful information: a full encyclopaedia of the world. What's going on here?
What's going on is that E contains a large amount of grounded information: information about the actual world. But when the meaning of Linear A was lost that grounding was lost too: it became just a series of symbols with a certain pattern to it. Its entropy (in the information-theory sense) did not change; but now it became essentially a random sequence with that particular entropy and pattern.
Adding in R restores that grounding, and so transforms E back into a collection of useful information about the world; again, without changing its information content in the entropy sense.
This is the symbol grounding problem, not seen as an isosteric philosophical issue, but a practical learning problem.
To put this in symbols, let W be the world, let FW be a set of features in the world, and QW be a probability distribution over these features (see this post for details of the terminology).
An agent A has their own feature set FA, in Linear A, which corresponds to the features of the world. The correspondence is given by the map gA:FW→FA. The agent also knows QW.
That A then sits down to write the encyclopaedia E; in it, they put a lot of their knowledge of QW, using the terms in FA to do so.
Thus any agent B that knows gA, the grounding of A's symbols, can use E to figure out a lot about QW and hence about the world.
What if agent B doesn't know Linear A, but does know English? Then they have their own symbol grounding gB:FW→FB. The encyclopaedia E is almost useless to them. It still has the same amount of information, but there's no way of translating that information into FB, the features that B uses.
Now the role of R is clear: by translating English and Linear A into each other, it defines an equivalence between FA, the features of A, and FB, the features of B (specifically, it defines gB∘g−1A:FA→FB).
Given this, B now knows FA. And, assuming they trust A, they can extract information on QW from the encyclopaedia E. Hence they can get some of A's real-world information.
Now consider E′, a text with the same amount of information to E, but generated by a random process. There is information about in E′, but its not real-world-information-about-W-mediated-through-A; it's just random information.
Maybe there could be a R′ that grounds the symbols of E′ in a way that makes sense for the real world. But, since E′ is random, the R′ would have to be huge: essentially R′ would have to contain all the real world information itself.
So we can say that both E and E′ are ungrounded for agent B; but E is at least potentially grounded, in that there is a short translation that will ground it. This "short translation", R, only works because E was constructed by A, a grounded agent (by gA), and B's features are also grounded in the same world (by gB). Thus all that R does is translate between grounded symbols.
Note that QW looks like pure syntax, not semantics. It doesn't define any features; instead it gives a (probabilistic) measure of how various features relate to each other. Thus E only gives syntactic information, it seems.
But, as I noted in previous posts, the boundary between syntax and semantics is ambiguous.
Suppose that an agent A has a feature "gavagai", another feature "snark", and let p=QA(snark∣gavagai) be a term in their syntax, corresponding to the probability of "snark", given "gavagai".
Suppose we are starting to suspect that, semantically, "gavagai" means either rabbit or cooked meal, while "snark" means fur.
Then if p≈1, "gavagai" is likely to be rabbit, since whenever gavagai is present, there is fur. Conversely, if p≈0, then "gavagai" is likely to be a meal, since gavagai means no fur around. Re-conversely, if gavagai is rabbit, p is likely to be close to 1.
None of these interpretations are definite. But the probabilities of particular syntactic and semantic interpretations go up and down in relation to each other. So purely syntactical information provides evidence for symbol grounding - because this syntax has evolved via interactions with the real world. In fact, gA, QA and even FA, all evolve through interactions with the real world, and can change when new information arrives or the model features splinter.
And syntax and semantics can also substitute for each other, to some extent. Suppose A has only ever seen white rabbits. Then maybe "gavagai" means "white rabbit", a semantic statement. Or maybe "gavagai" means "rabbit", while A also believes the syntactic statement "all rabbits are white". Either option would result in the same behaviour and pretty much the same ways of thinking, at least in circumstances A has already encountered.
You could see semantics, in humans, as instinctively recognising certain features of the world, and syntax as logically putting together these features to derive other features (using rules and life experience learnt in the past). Putting it this way, it's clear why the line between syntax and semantics is vague, and can vary with time and experience.
We also often only have implicit access to our own gA and our own QA, further complicating the distinction.
In the post on model splintering, I argued for doing without underlying "worlds", and just using the imperfect features and models of the agents.
This is needed in this setting, too. Agent A might make use of the feature "human being", but that doesn't mean that the world itself needs to have a well-defined concept of "human being". It's very hard to unambiguously define "human being" in terms of physics, and any definition would be debatable. Moreover, A did not use advanced physics in their own definition.
For symbol translation, though, it suffices that B understands roughly what A means by "human beings" - neither needs a perfect definition.
So, let's change the definitions a bit. Agent A has their features FA and their probability distribution over features, QA (the syntax); the same goes for agent B with FB and QB.
The grounding operators gA and gB don't map from the features FW of the world, but from the private input-output histories of the two agents, hA and hB. Write QA(fA=n∣hA,gA)=p to designate that p is the probability that feature fA∈FA is equal to n, given the history hA and the grounding function gA.
Now, what about B? They will have their assessment of A's internal symbols; let f∗BA be B's interpretation of A's symbol fA.
They can compute expressions like QB(f∗BA=n∣hB,gB), the probability B assigns to A's feature fA being equal to n. It will likely reach this probability assignment by using g∗BA, Q∗BA and h∗BA: these are B's assessment of A's grounding, syntax, and history. Summing over these gives:
Then B will consider feature fA to be perfectly grounded if there exists fB, a feature in FB, such that for all histories hB,
Thus, A's assessment of fA (according to B's estimate) is always perfectly aligned with B's assessment of fB.
Perfect grounding is far too strong a condition, in general. It means that B believes that A's estimate of fA is always better than its own estimate of fB.
Suppose that fA and fB referred to the number of people in a given room. Sometimes A is in the room, and sometimes B is.
Obviously it makes sense for B to strongly trust A's estimate when A is in the room and B is not; conversely, it would (generally) be wrong for B to do so when their positions are reversed.
So sometimes B will consider A's estimates to be well grounded and correct[^nodistinct]; sometimes it won't. Grounding, in practice, involves modelling what the other person knows and doesn't know. We might have a some idea as to what would lead A astray - eg a very convincing human-like robot in the room, combined with poor lighting.
That sounds both very complicated (if we wanted to formalise that in terms of what agent B believes about what agent A believes about features...) and very simple (it's exactly what human theory of mind is all about).
What this means is that, given this framework, figuring out the grounding of the symbols of another agent is a messy empirical process. It's not a question of resolving esoteric philosophical questions, but assessing the knowledge, honesty, accuracy, and so on, of another agent.
Note the empiricism here; we've moved from:
Those happy few who know my research well may be puzzled at this juncture. Knowing the internal features of another agent is the ultimate structured white-box model - we not only know how the other agent thinks, but we assign useful labels to their internal processes. But when I introduced that concept, I argued that you could not get structured white-box models without making major assumptions. So what's going on here?
First of all, note that the definition of grounding of fA is given entirely in terms of B's estimate of A's features. In the limit, B could be utterly wrong about everything about A's internals, and still find that A's symbols are grounded, even perfectly grounded.
That's because B is not using A's symbols at all, but their own estimate of A's symbols. Maybe A always feels fear and disgust when they get cold. Assume that that is the only time that A feels both those sentiments at once. Then A might have two symbols, "fear" and "disgust", while B might model "fear-plus-disgust" as the single feature "cold". And then B can use that feature, empirically and correctly, to predict the temperature. So B thinks that A's feature "cold" is well grounded, even if the feature doesn't even exist in A's models. This is the sense in which "gavagai" meaning either "rabbit" or "undetached rabbit-part" - actually mean the same thing.
But for humans, there are other factors at work. Humans communicate, using words to designate useful concepts, and converge on mutually intelligible interpretations of those words, at least in common situations. A phrase like "Run, tiger!" needs to be clearly understood immediately.
Now, we humans:
This means that we will converge on roughly shared understandings of what words mean, at least in typical environments. This rough shared understanding explains both the untranslatability of language and why it's mainly translatable. We might not get all the linguistic nuances of the Tale of Genji, but we do know that it's about Japanese courtiers in the 11th century, and not about time-travelling robots from the future.
A last note on language: we can use it to explore concepts that we've never encountered, even concepts that don't exist. This means that, with language, we might realise that, for someone, "gavagai" means "undetached rabbit part" rather than "rabbit", because we can use linguistic concepts to imagine a distinction between those two ideas. And then we can communicate this distinction to others.
This kind of reasoning causes me to suspect that the GPT-n series of algorithms will not reach super-human levels of capability. They've achieved a lot through syntactic manipulation of texts; but their symbols are almost certainly ungrounded. Consider two hypotheses:
I think there's evidence for the second hypothesis - for example, the successes of the current and past GPT-ns. It does not seem plausible that these machines are currently modelling us from electrons upwards.
But if the second hypothesis is true, then we'll expect that the GPT-ns will reach a plateau at or near the maximal current human ability. Consider two models, Mp (a full physics model of humanity and enough of the universe), and Ms (a simplified model of human text generation). As long as the GPT-ns are successful with Ms, there will be no pressure on them to develop Mp. Pressure can mean reinforcement learning, objective functions, or humans ranking outputs or tweaking the code. For the algorithm to converge on Mp, the following need to be true:
It seems to me that 1. might be true, but 2. seems very unlikely to be true. Therefore, I don't think that GPT-ns will need or be able to ground its symbols, and hence will be restricted to human-comparable levels of ability.
We could empirically test this, in fact. Feed GPT-3 all the physics papers we have until 1904. Could GPT-3 or any of its successors generate special and general relativity from that data? I would be extremely surprised, and mildly terrified, if it did. Because it could only do so if it really understood, in a grounded way, what physics was.
Thanks to Rebecca Gorman for help with this research.
Using gA to define semantics. ↩︎
Like learning that "dog" and "wolf" are meaningfully different and can't be treated the same way -- or that different breeds of dogs are also different in relevant ways. In that case, the categorisation and/or modelling will shift to become more discriminatory and precise. This tends to be in areas of relevance to the human: for a dog breeder, the individual breeds of dogs are very precise and detailed categories, while everything that lives in the sea might be cheerfully dumped into the single category "fish". ↩︎
See how carefully learnt deductions can become instinctive, or how we can use reason to retrain our instincts. ↩︎
Notice that this formulation means that we don't need to distinguish the contribution of semantics (gA) from that of syntax (QA): both are folded into the same expression. ↩︎
Maybe filter out Lorentz's papers. ↩︎
I may write more on this later, but for now I just want to express exuberance at someone in the x-risk space thinking and writing about this :)
Express, express away _