porby — AI Alignment Forum

Implied "utilities" of simulators are broad, dense, and shallow

This is a quick attempt at deconfusion similar to instrumentality. Same ideas, different angle. Extremely broad, dense reward functions constrain training-compatible goal sets Predictors/simulators are typically trained against a ground truth for every output. There is no gap between the output and its evaluation; an episode need not be completed...

Mar 1, 202345

Instrumentality makes agents agenty

You could describe the behavior of untuned GPT-like model[1] using a (peculiar) utility function. The fact that the loss function and training didn't explicitly involve a reward function doesn't mean a utility function can't represent what's learned, after all. Coming from the opposite direction, you could also train a predictor...

Feb 21, 202321

Simulators, constraints, and goal agnosticism: porbynotes vol. 1

This is a part of a maybe-series where I braindump safety notes while waiting on training runs to complete. It's mostly talking to myself, but talking to myself in public seems somewhat more productive. The content of this post is not guaranteed to be novel, interesting, or correct, though I...

Nov 23, 202240