Nice work. But I wonder why people are so surprised that these models and GPT would learn a model of the world. Of course they learn a model of the world. Even the skip-gram and CBOW word vectors people trained ages ago modelled the world, in the sense that for example named entities in vector space would be highly correlated with actual spatial/geographical maps. It should be 100% assumed that these models which have many orders of magnitude more parameters are learning much more sophisticated models of the world. What that tells us about their "intelligence" is an entirely different question whatsoever. They are still statistical next token predictors, it's just the statistics are so complicated it essentially becomes a world model. The divide between these concepts is artificial.