I hope to find time to give a more thorough reply later; what I say below is hasty and may contain errors.
(1)
Define general competence factor as intelligence*coherence.
Take all the classic arguments about AI risk and ctrl-f "intelligence" and then replace it with "general competence factor."
The arguments now survive your objection, I think.
When we select for powerful AGIs, when we train them to do stuff for us, we are generally speaking also training them to be coherent. It's more accurate to say we are training them to have a high general competence factor, then to say we are training them to be intelligent-but-not-necessarily-coherent. The ones that aren't so coherent will struggle to take over the world, yes, but they will also struggle to compete in the marketplace (and possibly even the training environment) with the ones that are.
(2) I'm a bit annoyed by the various bits of this post/paper that straw man the AI safety community, e.g. saying that X is commonly assumed when in fact there are pages and pages of argumentation supporting X, which you can easily find with a google, and lots more pages of argumentation on both sides of the issue as to whether X.
Relatedly... I just flat-out reject the premise that most work on AI risk assumes that AI will be less of a hot mess than humans. I for one am planning for a world where AIs are about as much of a hot mess as humans, at least at first. I think it'll be a great achievement (relative to our current trajectory) if we can successfully leverage hot-mess AIs to end the acute risk period.
(3) That said, I'm intellectually curious/excited to discuss these results and arguments with you, and grateful that you did this research & posted it here. :) Onwards to solving these problems collaboratively!
Seems like the concept of "coherence" used here is inclined to treat simple stimulus-response behavior as highly coherent. e.g., The author puts a thermostat in the supercoherent unintelligent corner of one of his graphs.
But stimulus-response behavior, like a blue-minimizing robot, only looks like coherent goal pursuit in a narrow set of contexts. The relationship between its behavioral patterns and its progress towards goals is context-dependent, and will go off the rails if you take it out of the narrow set of contexts where it fits. That's not "a hot mess of self-undermining behavior", so it's not the lack-of-coherence that this question was designed to get at.
Here's a hypothesis about the inverse correlation arising from your observation: When we evaluate a thing's coherence, we sample behaviours in environments we expect to find the thing in. More intelligent things operate in a wider variety of environments, and the environmental diversity leads to behavioural diversity that we attribute to a lack of coherence.
Without thinking about it too much, this fits my intuitive sense. An amoeba can't possibly demonstrate a high level of incoherence because it simply can't do a lot of things, and whatever it does would have to be very much in line with its goal (?) of survival and reproduction.
A hypothesis for the negative correlation:
More intelligent agents have a larger set of possible courses of action that they're potentially capable of evaluating and carrying out. But picking an option from a larger set is harder than picking an option from a smaller set. So max performance grows faster than typical performance as intelligence increases, and errors look more like 'disarray' than like 'just not being capable of that'. e.g. Compare a human who left the window open while running the heater on a cold day, with a thermostat that left the window open while running the heater.
A Second Hypothesis: Higher intelligence often involves increasing generality - having a larger set of goals, operating across a wider range of environments. But that increased generality makes the agent less predictable by an observer who is modeling the agent as using means-ends reasoning, because the agent is not just relying on a small number of means-ends paths in the way that a narrower agent would. This makes the agent seem less coherent in a sense, but that is not because the agent is less goal-directed (indeed, it might be more goal-directed and less of a stimulus-response machine).
These seem very relevant for comparing very different agents: comparisons across classes, or of different species, or perhaps for comparing different AI models. Less clear that they would apply for comparing different humans, or different organizations.
(Crossposting some of my twitter comments).
I liked this criticism of alignment approaches: it makes a concrete claim that addresses the crux of the matter, and provides supporting evidence! I also disagree with it, and will say some things about why.
I think that instead of thinking in terms of "coherence" vs. "hot mess", it is more fruitful to think about "how much influence is this system exerting on its environment?". Too much influence will kill humans, if directed at an outcome we're not able to choose. (The rest of my comments are all variations on this basic theme).
We humans may be a hot mess, but we're far better at influencing (optimizing) our environment than any other animal or ML system. Example: we build helicopters and roads, which are very unlikely to arise by accident in a world without people trying to build helicopters or roads. If a system is good enough at achieving outcomes, it is dangerous whether or not it is a "hot mess".
It's much easier for us to describe simple behaviors as utility maximization; for example a ball rolling down a hill is well-described as minimizing its potential energy. So it's natural that people will rate a dumb / simple system as being more easily described by a utility function than a smart system with complex behaviors. This does not make the smart system any less dangerous.
Misalignment risk is not about expecting a system to "inflexibly" or "monomanically" pursuing a simple objective. It's about expecting systems to pursue objectives at all. The objectives don't need to be simple or easy to understand.
Intelligence isn't the right measure to have on the X-axis - it evokes a math professor in an ivory tower, removed from the goings-on in the real world. A better word might be capability: "how good is this entity at going out into the world and getting more of what it wants?"
In practice, AI labs are working on improving capability, rather than intelligence defined abstractly in a way that does not connect to capability. And capability is about achieving objectives.
If we build something more capable than humans in a certain domain, we should expect it to be "coherent" in the sense that it will not make any mistakes that a smart human wouldn't have made. Caveat: it might make more of a particular kind of mistake, and make up for it by being better at other things. This happens with current systems, and IMO plausibly we'll see something similar even in the kind of system I'd call AGI. But at some point the capabilities of AI systems will be general enough that they will stop making mistakes that are exploitable by humans. This includes mistakes like "fail to notice that your programmer could shut you down, and that would stop you from achieving any of your objectives".
The organizations one seems like an obvious collider - you got the list by selecting for something like "notability," which is contributed to by both intelligence and coherence, and so on the sample it makes sense they're anticorrelated.
But I think the rankings for animals/plants isn't like that. Instead, it really seems to trade on what people mean by "coherence" - here I agree with Unnamed, it seems like "coherence" is getting interpreted as "simplicity of models that work pretty well to describe the thing," even if those models don't look like utility maximizers. Put an oak tree in a box with a lever that dispenses water, and it won't pull the lever when it's thirsty, but because the overall model that describes an oak tree is simpler than the model that describes a rat, it feels "coherent." This is a fine way to use the word, but it's not quite what's relevant to arguments about AI.
Put an oak tree in a box with a lever that dispenses water, and it won't pull the lever when it's thirsty
I actually thought this was a super interesting question, just for general world modelling. The tree won't pull a lever because it barely has the capability to do so and no prior that it might work, but it could, like, control a water dispenser via sap distribution to a particular branch. In that case will the tree learn to use it?
Ended up finding an article on attempts to show learned behavioural responses to stimuli in plants at On the Conditioning of Plants: A Review of Experimental Evidence - turns out there have been some positive results but they seem not to have replicated, as well as lots of negative results, so my guess is that no, even if they are given direct control, the tree won't control its own water supply. More generally this would agree that plants lack the information processing systems to coherently use their tools.
Experiments are mostly done with M. pudica because it shows (fairly) rapid movement to close up its leaves when shaken.
Not sure I would agree about a single ant being coherent. Aren't ants super dependent on their colonies for reasonable behavior? Like they use pheromone trails to find food and so on.
Also AFAIK a misplaced pheromone trail can lead an ant to walk in circles, which is the archetypal example of incoherence. But I don't know how much that happens in practice.
This is a cool result - I think it's really not obvious why intelligence and "coherence" seem inversely correlated, but it's interesting that you replicated it across three different classes of things (ML models, animals, organisations).
I think it's misleading to describe this as finding that intelligence and coherence are actually inversely correlated. Rather, survey respondents' ratings of intelligence and ratings of coherence were inversely correlated.
Epistemic status: clumsy
An AI could also be misaligned because it acts in ways that don't pursue any consistent goal (incoherence).
It’s worth noting that this definition of incoherence seems inconsistent with VNM. Eg. A rock might satisfy the folk definition of “pursuing a consistent goal,” but fail to satisfy VNM due to lacking completeness (and by corollary due to not performing expected utility optimization over the outcome space).
Strong upvoted.
The result is surprising and raises interesting questions about the nature of coherence. Even if this turns out to be a fluke, I predict that it’d be an informative one.
I don't think that measurements of the concept of "coherence" which implies that an ant is more coherent than AlphaGo is valuable in this context.
However, I think that pointing out the assumption about the relationship between intelligence and coherence is.
Very interesting.
In favor:
1) The currently leading models (LLMs) are ultimate hot messes;
2) The whole point of G in AGI is that it can do many things; focusing on a single goal is possible, but is not a "natural mode" for general intelligence.
Against:
A superintelligent system will probably have enough capacity overhang to create multiple threads which would look to us like supercoherent superintelligent threads, so even a single system is likely to lead to multiple "virtual supercoherent superintelligent AIs" among other less coherent and more exploratory behaviors it would also perform.
But it's a good argument against a supercoherent superintelligent singleton (even a single system which does have supercoherent superintelligent subthreads is likely to have a variety of those).
I think this is taking aim at Yudkowskian arguments that are not cruxy for AI takeover risk as I see it. The second species doesn't need to be supercoherent in order to kill us or put us in a box; human levels of coherence will do fine for that.