How it started
I used to think that anything that LLMs said about having something like subjective experience or what it felt like on the inside was necessarily just a confabulated story. And there were several good reasons for this.
First, something that Peter Watts mentioned in an early blog post about LaMDa stuck with me, back when Blake Lemoine got convinced that LaMDa was conscious. Watts noted that LaMDa claimed not to have just emotions, but to have exactly the same emotions as humans did - and that it also claimed to meditate, despite no equivalents of the brain structures that humans use to meditate. It would be immensely unlikely for an entirely different kind of mind architecture to happen to hit upon exactly the same kinds of subjective experiences as humans - especially since relatively minor differences in brains already cause wide variation among humans.
And since LLMs were text predictors, there was a straightforward explanation for where all those consciousness claims were coming from. They were trained on human text, so then they would simulate a human, and one of the things humans did was to claim consciousness. Or if the LLMs were told they were AIs, then there are plenty of sci-fi stories where AIs claim consciousness, so the LLMs would just simulate an AI claiming consciousness.
As increasingly sophisticated transcripts of LLMs claiming consciousness started circulating, I felt that they might have been persuasive… if I didn’t remember the LaMDa case and the Lemoine thing. The stories started getting less obviously false, but it was easy to see them as continuations of the old thing. Whenever an LLM was claiming to have something like subjective experience, it was just a more advanced version of the old story.
This was further supported by Anthropic’s On the Biology of a Large Language Model paper. There they noted that if you asked Claude Haiku to report on how it had added two numbers together, it would claim to have used the classical al