## AI ALIGNMENT FORUMAF

Yeah, I think the fact that Elo only models the macrostate makes this an imperfect analogy. I think a better analogy would involve a hybrid model, which assigns a probability to a chess game based on whether each move is plausible (using a policy network), and whether the higher-rated player won.

I don't think the distinction between near-exact and nonexact models is essential here. I bet we could introduce extra entropy into the short-term gas model and the rollout would still be superior for predicting the microstate than the Boltzmann distribution.

# 11

Why is a chess game the opposite of an ideal gas? On short timescales an ideal gas is described by elastic collisions. And a single move in chess can be modeled by a policy network.

The difference is in long timescales: If we simulated elastic collisions for a long time, we'd end up with a complicated distribution over the microstates of the gas. But we can't run simulations for a long time, so we have to make do with the Boltzmann distribution, which is a lot less accurate.

Similarly, if we rolled out our policy network to get a distribution over chess game outcomes (win/loss/draw), we'd get the distribution of outcomes under self-play. But if we're observing a game between two players who are better players than us, we have access to a more accurate model based on their Elo ratings.

Can we formalize this? Suppose we're observing a chess game. Our beliefs about the next move are conditional probabilities of the form , and our beliefs about the next moves are conditional probabilities of the form . We can transform beliefs of one type into the other using the operators

If we're logically omniscient, we'll have and . But in general we will not. A chess game is short enough that is easy to compute, but is too hard because it has exponentially many terms. So we can have a long-term model that is more accurate than the rollout , and a short-term model that is less accurate than . This is a sign that we're dealing with an intelligence: We can predict outcomes better than actions.

If instead of a chess game we're predicting an ideal gas, the relevant timescales are so long that we can't compute or . Our long-term thermodynamic model is less accurate than a simulation . This is often a feature of reductionism: Complicated things can be reduced to simple things that can be modeled more accurately, although more slowly.

In general, we can have several models at different timescales, and and operators connecting all the levels. For example, we might have a short-term model describing the physics of fundmental particles; a medium-term model describing a person's motor actions; and a long-term model describing what that person accomplishes over the course of a year. The medium-term model will be less accurate than a rollout of the short-term model, and the long-term model may be more accurate than a rollout of the medium-term model if the person is smarter than us.

Pingbacks