Always love to see some well made AI optimism arguments, great work!
The current generation of easily aligned LLMs should definitely update one towards alignment being a bit easier than expected, if only because they might be used as tools to solve some parts of alignment for us. This wouldn't be possible if they were already openly scheming against us.
It's not impossible that we are in an alignment-by-default world. But, I claim that our current insight isn't enough to distinguish such a world from the gradual disempowerment/going out with a whimper world.
In particular, your argument only holds * if the current architecture continues to smoothly scale to AGI and beyond and, * if the current... (read more)
Always love to see some well made AI optimism arguments, great work!
The current generation of easily aligned LLMs should definitely update one towards alignment being a bit easier than expected, if only because they might be used as tools to solve some parts of alignment for us. This wouldn't be possible if they were already openly scheming against us.
It's not impossible that we are in an alignment-by-default world. But, I claim that our current insight isn't enough to distinguish such a world from the gradual disempowerment/going out with a whimper world.
In particular, your argument only holds
* if the current architecture continues to smoothly scale to AGI and beyond and,
* if the current... (read more)