AI ALIGNMENT FORUM
AF

Circuitrinos
000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Actually, Othello-GPT Has A Linear Emergent World Representation
Circuitrinos2y30

Regarding this quote "we see that the model trained to be good at Othello seems to have a much worse world model"

What if for LLMs trained to play games like Othello, chess, go, etc..., instead of directly training models to play the best moves, we first train them to play legal moves like in this paper to have it construct a good world model.

Then once it has a world model, we "freeze" those weights and add on additional layers and train just those layers to play the game well.

Wouldn't this force the play-well model to include the good world model? (a model we can probe/understand).

Wouldn't that also force the play-well layers of the model to learn something much easier to probe and understand?

From there, we could potentially probe the play-well layers to learn something about what the optimal strategy of the game actually is.

Reply
No posts to display.