Top postsTop post
Max H
Message
Most of my posts and comments are about AI and alignment. Posts I'm most proud of, which also provide a good introduction to my worldview:
...3287
Ω
73
24
433
In Reward is not the optimization target, @TurnTrout writes: > Therefore, reward is not the optimization target in two senses: > > 1. Deep reinforcement learning agents will not come to intrinsically and primarily value their reward signal; reward is not the trained agent’s optimization target. > 2. Utility functions...
In my recent post on steering systems, I sketched an AI system which chooses which actions to execute via tree search over world states. In the original post, the system is meant to illustrate the possibility that component subsystems which are human-level and (possibly) non-dangerous when used individually can be...
[The basic idea in this post is probably not original to me, since it's somewhat obvious when stated directly. But it seems particularly relevant and worth distilling in light of recent developments with LLM-based systems, and because I keep seeing arguments which seem confused about it.] Alignment is a property...