I'm writing a book about epistemology. It's about The Problem of the Criterion, why it's important, and what it has to tell us about how we approach knowing the truth.
I've also written a lot about AI safety. Some of the more interesting stuff can be found at the site of my currently-dormant AI safety org, PAISRI.
We've spent years talking about "aligned" AI, and "Friendly" AI before that, but maybe we should have spent that time talking about "noble" AI.
To be noble is, in part, to act in the best interests of one's domain of responsibility. Historically this meant the peasants living and working on an estate. Today this might mean being a responsible leader of a business who prioritizes people as well as profits and owns the fallout of hard decisions, or being a responsible political leader, though those seem few and far between these days.
We've lost touch with the idea of nobility, but a noble AI might exhibit these traits we think as positive for alignment:
and many more. I'm just starting to think about this idea as an alternative framing for what we've been calling alignment, so curious for folks thoughts.
We used to think a lot about the potential for issues caused by an AI experiencing an ontological crisis. But your post seems to suggest we should perhaps be more concerned by the issues created by imposing ontology on AIs and them running away with that ontology. Is that how you're thinking about this?
I'd really like to see more follow up on the ideas made in this post. Our drive to care is arguably why we're willing to cooperate, and making AI that cares the same way we do is a potentially viable path to AI aligned with human values, but I've not seen anyone take it up. Regardless, I think this is an important idea and think folks should look at it more closely.
I think this post is important because it brings old insights from cybernetics into a modern frame that relates to how folks are thinking about AI safety today. I strongly suspect that the big idea in this post, that ontology is shaped by usefulness, matters greatly to addressing fundamental problems in AI alignment.
Seems reasonable. I do still worry quite a bit about Goodharting, but perhaps this could be reasonably addressed with careful oversight by some wise humans to do the wisdom equivalent of red teaming.
According to METR, the organization that audited OpenAI, a dozen tasks indicate ARA capabilities.
Small comment, but @Beth Barnes of METR posted on Less Wrong just yesterday to say "We should not be considered to have ‘audited’ GPT-4 or Claude".
This doesn't appear to be a load-bearing point in your post, but would still be good to update the language to be more precise.
Ah I see. I have to admit, I write a lot of my comments between things and I missed that the context of the post could cause my words to be interpreted this way. These days I'm often in executive mode rather than scholar mode and miss nuance if it's not clearly highlighted, hence my misunderstanding, but also reflects where I'm coming from with this answer!
I left a comment over in the other thread, but I think Joachim misunderstands my position.
In the above comment I've taken for granted that there's a non-trivial possibility that AGI is near, so I'm not arguing we should say that "AGI is near" regardless of whether it is or not, because we don't know if it is or not, we only have our guesses about it, and so long as there's a non-trivial chance that AGI is near, I think that's the more important message to communicate.
Overall it would be better if we can communicate something like "AGI is probably near", but "probably" and similar terms are going to get rounded off, so even if you do literally say "AGI is probably near" or similar, that's not what people will hear, and if you're going to say "probably" my argument is that it's better if they round the "probably" off to "near" rather than "not near".
From a broad policy perspective, it can be tricky to know what to communicate. I think it helps if we think a bit more about the effects of our communication and a bit less about correctly conveying our level of credence in particular claims. Let me explain.
If we communicate the simple idea that AGI is near then it pushes people to work on safety projects that would be good to work on even if AGI is not near while paying some costs in terms of reputation, mental health, and personal wealth.
If we communicate the simple idea that AGI is not near then people will feel less need to work on safety soon. This would let them not miss out on opportunities that would be good to take ahead of when they actually need to focus on AI safety.
We can only really communicate one thing at a time to people. Also, we should worry more about tail risks a false positives (thinking we can build AGI safely when we cannot) than false negatives (thinking we can't build AGI safely when we can). Taking these two facts into consideration, I think the policy implication is clear: unless there is extremely strong evidence that AGI is not near, we must act and communicate as if AGI is near.
I don't disagree that those who we called nobels frequently acted badly. But I do see idealized noble values as worth looking at. Think less real kings and lords and more valorized archetypes like Robin Hood, King Richard the Lionhearted, of course King Arthur and his Knights. I think this fiction captures a kind of picture of the expectations we set for what good leaders look like who have power over others, and that's the version I'm suggesting is worth using as a starting point for what we want "good" AI to look like.
I'm also not very concerned about the economic reality of what made the need for idealized nobility norms exist in feudal societies. I don't see that as a key part of what I'm pointing at. Nobility has a larger and longer tradition than the one used in Medieval Europe, though it is the expression of it that I and most folks on Less Wrong are probably familiar with.