In the wake of FLI’s AI 6-month pause letter and Eliezer Yudkowsky’s doom article in Time, there seems to be more high-level attention on the arguments for low and high estimates of AI existential risk. My estimate is currently ~30%, which is higher than most ML researchers but seems lower...
I think a potentially promising and undertheorized approach to AI safety, especially in short timelines, is natural language alignment (NLA), a form of AI-assisted alignment in which we leverage the model’s rich understanding of human language to help it develop and pursue a safe[1] notion of social value by bootstrapping...
I read several shard theory posts and found the details interesting, but I couldn't quite see the big picture. I'm used to hearing "theory" refer to a falsifiable generalization of data. It is typically stated as a sentence, paragraph, list, mathematical expression, or diagram early in a scientific paper. Theories...