Two problems with ‘Simulators’ as a frame

[-]evhub3y63

I basically agree with this, and a lot of these are the sorts of reasons we went with "predictor" over "simulator" in "Conditioning Predictive Models."

[-]Raemon3y20

I was a bit unsure whether to tag your posts with Simulator Theory. Do you endorse that or not?

[-]evhub3y51

Yeah, I endorse that. I think we are very much trying to talk about the same thing, it's more just a terminological disagreement. Perhaps I would advocate for the tag itself being changed to "Predictor Theory" or "Predictive Models" or something instead.

[-]LawrenceC3y11

I broadly agree with the points being made here, but allow me to nitpick the use of the word "predictive" here, and argue for the key advantage of the simulators framing over the prediction one:

Pretrained models don’t ‘simulate a character speaking’; they predict what comes next, which implicitly involves making predictions about the distribution of characters and what they would say next.

The simulators frame does make it very clear that there's a distinction between the simulator/GPT-3 and the simulacra/characters or situations it's making predictions about! On the other hand, using "prediction" can obscure the distinction, and end up with confused questions like "is GPT just an agent that just wants to minimize predictive loss?"

[-]Charlie Steiner3y11

I think the biggest pitfall of the "simulator" framing is that it's made people (including Beth Barnes?) think it's all about simulating our physical reality, when exactly because of the constraints you mention (text not actually pinpointing the state of the universe, etc.), the abstractions developed by a predictor are usually better understood in terms of treating the text itself as the state, and learning time-evolution rules for that state.

[-]ryan_greenblatt3y40

Thinking about the state and time evolution rules for the state seems fine, but there isn't any interesting structure with the naive formulation imo. The state is the entire text, so we don't get any interesting Markov chain structure. (you can turn any random process into a Markov chain where you include the entire history in the state! The interesting property was that the past didn't matter!)

[-]Charlie Steiner3y10

Hm, I mostly agree. There isn't any interesting structure by default, you have to get it by trying to mimic a training distribution that has interesting structure.

And I think this relates to another way that I was too reductive, which is that if I want to talk about "simulacra" as a thing, then they don't exist purely in the text, so I must be sneaking in another ontology somewhere - an ontology that consists of features inferred from text (but still not actually the state of our real universe).

[-]LawrenceC3y10

Nitpick: I mean, technically, the state is only the last 4k tokens or however long your context length is. Though I agree this is still very uninteresting.

[-]LawrenceC3y12

The time-evolution rules of the state are simply the probabilities of the autoregressive model -- there's some amount of high level structure but not a lot. (As Ryan says, you don't get the normal property you want from a state (the Markov property) except in a very weak sense.)

I also disagree that purely thinking about the text as state + GPT-3 as evolution rules is the intention of the original simulators post; there's a lot of discussion about the content of the simulations themselves as simulated realities or alternative universes (though the post does clarify that it's not literally physical reality), e.g.:

I can’t convey all that experiential data here, so here are some rationalizations of why I’m partial to the term, inspired by the context of this post:
The word “simulator” evokes a model of real processes which can be used to run virtual processes in virtual reality.
It suggests an ontological distinction between the simulator and things that are simulated, and avoids the fallacy of attributing contingent properties of the latter to the former.
It’s not confusing that multiple simulacra can be instantiated at once, or an agent embedded in a tragedy, etc.
[...]
The next post will be all about the physics analogy, so here I’ll only tie what I said earlier to the simulation objective.
the upper bound of what can be learned from a dataset is not the most capable trajectory, but the conditional structure of the universe implicated by their sum.
To know the conditional structure of the universe^[27] is to know its laws of physics, which describe what is expected to happen under what conditions.

I think insofar as people end up thinking the simulation is an exact match for physical reality, the problem was not in the simulators frame itself, but instead the fact that the word physics was used 47 times in the post, while only the first few instances make it clear that literal physics is intended only as a metaphor.

See the appendix. ↩︎
Insofar as it’s useful to try to reason about what exact actions the pre-training objective incentives in particular cases. I’m not sold on this being considerably useful in most cases. ↩︎
Note that I disagree with quite a bit of the framing and emphasis of Conditioning Predictive Models. Don’t take this link as an endorsement! ↩︎
I think it’s about 90% sure based on doing some quick samples. ↩︎
This is also discussed in the Conditioning Predictive Models sequence ↩︎

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

42

Two problems with ‘Simulators’ as a frame

42

Related work

Language models are predictors, not simulators

Good prediction doesn’t imply good generation

Okay, but what do you see in practice?

Appendix: Some other agreements and disagreements with Simulators