x

AI ALIGNMENT FORUM

AF

Phaedrus — AI Alignment Forum

Phaedrus

Phaedrus

Message

108

10

1y

Phaedrus

108

1y

Why we should expect ruthless sociopath ASI

This approach ignores the fact that if we use advanced LLMs to make new paradigm advancements that are extremely effective RL sociopaths, we'll at that point have the help of the relatively harmless but still very powerful LLMs to do safety work on the RL agents — this is a major help with mitigating autonomy risks! Of course, there's always the risk that new RL architecture discoveries create economic incentives to scale the scary RL agents without sufficient safety work, but the prospect of using HHH AI to align scary AI is weirdly under-explored when talking about that exact advanced LLM + advanced RL learner world.