I'm a very confused person trying to become less confused. My history as a New Age mystic still colors everything I think even though I'm striving for rationality nowadays. Here's my backstory if you're interested.
What I'm expecting, if LLMs remain in the lead, is that we end up in a magical, spirit-haunted world where narrative causality starts to actually work, and trope-aware people essentially become magicians who can trick the world-sovereign AIs into treating them like protagonists and bending reality to suit them. Which would be cool as fuck, but also very chaotic. That may actually be the best-case alignment scenario right now, and I think there's a case for alignment-interested people who can't do research themselves but who have writing talent to write a LOT of fictional stories about AGIs that end up kind and benevolent, empower people in exactly this way, etc., to help stack the narrative-logic deck.
This reminds me of optimization at a distance.
Does this imply that AGI is not as likely to emerge from language models as might have been thought? To me it looks like it's saying that the only way to get enough data would be to have the AI actively interacting in the world - getting data itself.
The principles from the post can still be applied. Some humans do end up aligned to animals - particularly vegans (such as myself!). How does that happen? There empirically are examples of general intelligences with at least some tendency to terminally value entities massively less powerful than themselves; we should be analyzing how this occurs.
Also, remember that the problem is not to align an entire civilization of naturally evolved organisms to weaker entities. The problem is to align exactly one entirely artificial organism to weaker entities. This is much simpler, and as mentioned entirely possible by just figuring out how already existing people of that sort end up that way - but your use of "we" here seems to imply that you think the entirety of human civilization is the thing we ought to be using as inspiration for the AGI, which is not the case.
By the way: at least part of the explanation for why I personally am aligned to animals is that I have a strong tendency to be moved by the Care/Harm moral foundation - see this summary of The Righteous Mind for more details. It is unclear exactly how it is implemented in the brain, but it is suspected to be a generalization of the very old instincts that cause mothers to care about the safety and health of their children. I have literally, regularly told people that I perceive animals as identical in moral relevance to human children, implying that some kind of parental instincts are at work in the intuitions that make me care about their welfare. Even carnists feel this way about their pets, hence calling themselves e.g. "cat moms". So, the main question here for alignment is: how can we reverse engineer parental instincts?