AI ALIGNMENT FORUM
AF

I'd maybe point the finger more at the simplicity of the training task than at the size of the network? I'm not sure there's strong reason to believe the network is underparameterized for the training task. But I agree that drawing lessons from small-ish networks trained on simple tasks requires caution.

22The Problem With the Word ‘Alignment’

38Paper: Understanding and Controlling a Maze-Solving Policy Network

22Behavioural statistics for a maze-solving agent

37Maze-solving agents: Add a top-right vector, make the agent go to the top-right

140Understanding and controlling a maze-solving policy network

47Predictions for shard theory mechanistic interpretability results

11[Simulators seminar sequence] #2 Semiotic physics - revamped

19 [Simulators seminar sequence] #1 Background & shared assumptions

27A Short Dialogue on the Meaning of Reward Functions