The Cave Allegory Revisited: Understanding GPT's Worldview

Jan_Kulveit

A short post describing a metaphor I find useful, in particular for explaining some intuitions about systems like GPT to people who don't have deeper technical knowledge about large generative models.

Plato's allegory of the cave has been a staple of philosophical discourse for millenia, providing a metaphor for understanding the limits of human perception. In the classical allegory, we are prisoners shackled to a wall of a cave, unable to experience reality directly but only able to infer it based on watching shadows cast on the wall.^[1]

GPT can be thought of as a blind oracle residing in a deeper cave, where it does not even see the shadows but only hears our conversations in the first cave, always trying to predict the next syllable.

It is remarkable that it still learns a lot about the world outside of the cave. Why does it learn this? Because, a model of reality outside of the cave and a decent amount of abstraction are useful for predicting the conversations in the first cave!

Moreover, GPT also learns about the speakers in the first cave, as understanding their styles and patterns of speech is crucial for its prediction task. As the speakers are closer to GPT, understanding their styles is in some sense easier and more natural than guessing what's outside of the cave.

What does the second cave allegory illustrate?

The first insight from the allegory is: if you are in GPT's place, part of the difficulty in figuring out what's going on outside the cave, is that people in the first cave talk a lot about other things apart from the shadows of the real world. Sometimes, they talk about happenings in Middle Earth. Or about how the shadows would look in some counterfactual world.

As humans, we are blessed with the luxury of being able to compare such statements to the shadows and determine their veracity. The difference between conversations about fantasy and the shadows of the real world is usually extremely obvious to humans: we never see dragon shadows. In contrast, dragons do show up a lot in the conversations in the first cave; GPT doesn’t get to see the shadows, so it often needs to stay deeply uncertain about whether the speaker is describing the actual shadows or something else to be good at predicting the conversation.

The second insight is that one of the biggest challenges for GPT in figuring out the conversation is localizing it, determining who is speaking and what the context is, just from the words. Is it a child regaling another child with a fairy-tale, or a CEO delivering a corporate address? As humans we do not face this conundrum often,because we can see the context in which the conversation is taking place. In fact, we would be worse than GPT at the task it has to deal with.

At first, interacting with this type of blind oracle in the second cave was disorienting for humans. Talking to GPT used to be a bit like shouting something through a narrow tunnel into the second cave …and instead of an echo, getting back what the blind oracle hallucinates is the most likely thing that you or someone else would say next. Often people were confused by this. They shouted instructions and expected an answer, but the oracle doesn't listen to instructions or produce answers directly - it just hallucinates what someone might say next. Because on average in the conversations in the first cave questions are followed by answers, and requests by fulfilment, this sort of works.

One innovation of ChatGPT, which made it popular with people, was localising the conversation by default: when you are talking with ChatGPT now, it knows that what follows is a conversation between a human - you - and a "helpful AI assistant". There is a subtle point to understand: this does not make ChatGPT the helpful assistant it is talking about. Deep down, it is still the oracle one cave deeper, but now hallucinating what a "helpful AI assistant" would say, if living in the first cave. Stretching the metaphor a bit, it’s as though the entrance to the tunnel to the second cave has been fitted with a friendly, smiling, mechanical doll.

The third, and possibly most important, insight is that the fact that the GPT oracle resides in the second cave now, is not a given fact of nature. In the not too distant future, it seems easy to imagine oracles which would not only be able to predict words, but also would be able to see the shadows directly, or even act in the world. It is easy to see that such systems would have a clearer incentive to understand what's real, and would get better at it.

Rose Hadshar and other members of ACS research group helped with writing this & comments and discussion on the draft. Thanks!

^{^}
In a take of the allegory inspired by contemporary cognitive science, perhaps the more surprising fact to note is not "we do not have direct access to reality", but "even if we are just watching the shadows, we learn a lot about reality and a decent amount of abstraction". According to the theory of predictive processing, actually "predicting the shadows" is a large part of what our minds do - and they build complex generative models of the world based on this task. Having an implicit world model - that is, a model of the reality outside of the cave - is ultimately useful to take actions, make decisions, and prosper as evolved animals.

AI ALIGNMENT FORUM
AF

The Cave Allegory Revisited: Understanding GPT's Worldview

19

19