what exactly is it about human brains[1] that allows them to not always act like power-seeking ruthless consequentialists?
By asking this question, you've already lost me. The question tells me that "ruthless consequentialist" is your default mentality for how rational thinking beings operate, absent wiring / training / reward systems that limit the default outcome. And if that worldview is representative of the "technical-alignment-is-hard" camp, then of course the only plausible outcome of AI advance is "AIs eventually break free of those limite...
Thanks for the thoughtful reply. It took me a lot of squinting, but IIUC you're saying:
- Different kinds of minds, produced by different kinds of architectures, should likely exhibit very different levels of scary traits such as monomaniacal sociopathy.
- Stop focusing on LLMs so much; they're not the main threat. Yes they seem to exhibit more value-roundedness because they're trained to imitate humans, but they aren't likely to reach AGI anytime soon.
- Focus more on RL agents and "brain-like" architectures; those are built very differently and plausibly would ha
... (read more)