All of MSRayne's Comments + Replies

What I'm expecting, if LLMs remain in the lead, is that we end up in a magical, spirit-haunted world where narrative causality starts to actually work, and trope-aware people essentially become magicians who can trick the world-sovereign AIs into treating them like protagonists and bending reality to suit them. Which would be cool as fuck, but also very chaotic. That may actually be the best-case alignment scenario right now, and I think there's a case for alignment-interested people who can't do research themselves but who have writing talent to write a LOT of fictional stories about AGIs that end up kind and benevolent, empower people in exactly this way, etc., to help stack the narrative-logic deck.

I've writtenscryed a science fiction/takeoff story about this. Excerpt:
1Ramana Kumar2y
I agree it is related! I hope we as a community can triangulate in on whatever is going on between theories of mental representation and theories of optimisation or intelligence.

Does this imply that AGI is not as likely to emerge from language models as might have been thought? To me it looks like it's saying that the only way to get enough data would be to have the AI actively interacting in the world - getting data itself.

I definitely think it makes LM --> AGI less likely, although I didn't think it was very likely to begin with.

I'm not sure that the AI interacting with the world would help, at least with the narrow issue described here.

If we're talking about data produced by humans (perhaps solicited from them by an AI), then we're limited by the timescales of human behavior.   The data sources described in this post were produced by millions of humans writing text over the course of decades (in rough order-of-magnitude terms).

All that text was already there in the... (read more)

The principles from the post can still be applied. Some humans do end up aligned to animals - particularly vegans (such as myself!). How does that happen? There empirically are examples of general intelligences with at least some tendency to terminally value entities massively less powerful than themselves; we should be analyzing how this occurs.

Also, remember that the problem is not to align an entire civilization of naturally evolved organisms to weaker entities. The problem is to align exactly one entirely artificial organism to weaker entities. This is... (read more)

Human beings and other animals have parental instincts (and in general empathy) because they were evolutionary advantageous for the population that developed them.  AGI won't be subjected to the same evolutionary pressures, so every alignment strategy relying on empathy or social reward functions, it is, in my opinion, hopelessly naive. 
Sure, if you've got some example of a mechanism for this that's likely to scale, it may be worthwhile. I'm just pointing out that a lot of people have already thought about mechanisms and concluded that the mechanisms they could come up with would be unlikely to scale. I'm not a big fan of moral foundations theory for explaining individual differences in moral views. I think it lacks evidence.