I'm writing a book about epistemology. It's about The Problem of the Criterion, why it's important, and what it has to tell us about how we approach knowing the truth.
I've also written a lot about AI safety. Some of the more interesting stuff can be found at the site of my currently-dormant AI safety org, PAISRI.
This post still stands out to me as making an important and straightforward point about observer dependence of knowledge that is still, in my view, under appreciated (enough so that I wrote a book about it and related epistemological ideas!). I continue to think this is quite important for understanding AI, and in particular addressing interpretability concerns as they relate to safety, since lacking a general theory of why and how generalization happens, we may risk mistakes in building aligned AIs if they categorize the world in usual ways that we don't anticipate or understand.
I don't disagree that those who we called nobels frequently acted badly. But I do see idealized noble values as worth looking at. Think less real kings and lords and more valorized archetypes like Robin Hood, King Richard the Lionhearted, of course King Arthur and his Knights. I think this fiction captures a kind of picture of the expectations we set for what good leaders look like who have power over others, and that's the version I'm suggesting is worth using as a starting point for what we want "good" AI to look like.
I'm also not very concerned about the economic reality of what made the need for idealized nobility norms exist in feudal societies. I don't see that as a key part of what I'm pointing at. Nobility has a larger and longer tradition than the one used in Medieval Europe, though it is the expression of it that I and most folks on Less Wrong are probably familiar with.
We've spent years talking about "aligned" AI, and "Friendly" AI before that, but maybe we should have spent that time talking about "noble" AI.
To be noble is, in part, to act in the best interests of one's domain of responsibility. Historically this meant the peasants living and working on an estate. Today this might mean being a responsible leader of a business who prioritizes people as well as profits and owns the fallout of hard decisions, or being a responsible political leader, though those seem few and far between these days.
We've lost touch with the idea of nobility, but a noble AI might exhibit these traits we think as positive for alignment:
and many more. I'm just starting to think about this idea as an alternative framing for what we've been calling alignment, so curious for folks thoughts.
We used to think a lot about the potential for issues caused by an AI experiencing an ontological crisis. But your post seems to suggest we should perhaps be more concerned by the issues created by imposing ontology on AIs and them running away with that ontology. Is that how you're thinking about this?
I'd really like to see more follow up on the ideas made in this post. Our drive to care is arguably why we're willing to cooperate, and making AI that cares the same way we do is a potentially viable path to AI aligned with human values, but I've not seen anyone take it up. Regardless, I think this is an important idea and think folks should look at it more closely.
I think this post is important because it brings old insights from cybernetics into a modern frame that relates to how folks are thinking about AI safety today. I strongly suspect that the big idea in this post, that ontology is shaped by usefulness, matters greatly to addressing fundamental problems in AI alignment.
Seems reasonable. I do still worry quite a bit about Goodharting, but perhaps this could be reasonably addressed with careful oversight by some wise humans to do the wisdom equivalent of red teaming.
According to METR, the organization that audited OpenAI, a dozen tasks indicate ARA capabilities.
Small comment, but @Beth Barnes of METR posted on Less Wrong just yesterday to say "We should not be considered to have ‘audited’ GPT-4 or Claude".
This doesn't appear to be a load-bearing point in your post, but would still be good to update the language to be more precise.
Ah I see. I have to admit, I write a lot of my comments between things and I missed that the context of the post could cause my words to be interpreted this way. These days I'm often in executive mode rather than scholar mode and miss nuance if it's not clearly highlighted, hence my misunderstanding, but also reflects where I'm coming from with this answer!
I continue to be excited about this class of approaches. To explain why is roughly to give an argument for why I think self-other overlap is relevant to normative reasoning, so I will sketch that argument here:
But this sketch is easier explained than realized. We don't know exactly how humans come to care about others, so we don't know how to instrument this in AI. We also know that human care for others is not perfect because evil exists (in that humans sometimes intentionally violate norms with the intent to harm others), so just getting AI that cares is not clearly a full solution to alignment. But, to the extent that humans are aligned, it seems to be because they care about what others care about, and this research is an important step in the direction of building AI that cares about other agents, like humans.