Roughly speaking, this is because when you grow minds, they don’t care about what you ask them to care about and they don’t care about what you train them to care about; instead, I expect them to care about a bunch of correlates of the training signal in weird and specific ways.
(Similar to how the human genome was naturally selected for inclusive genetic fitness, but the resultant humans didn’t end up with a preference for “whatever food they model as useful for inclusive genetic fitness”. Instead, humans wound up internalizing a huge and complex set of preferences for "tasty" foods, laden with complications like “ice cream is good when it’s frozen but not when it’s melted”.)
I simply do not understand why people keep using this example.
I think it is wrong -- evolution does not grow minds, it grows hyperparameters for minds. When you look at the actual process for how we actually start to like ice-cream -- namely, we eat it, and then we get a reward, and that's why we like it -- then the world looks a a lot less hostile, and misalignment a lot less likely.
But given that this example is so controversial, even if it were right why would you use it -- at least, why would you use it if you had any other example at all to turn to?
Why on push so hard for "natural selection" and "stochastic gradient descent" to be beneath the same tag of "optimization", and thus to be able to infer things about the other from the analogy? Have we completely forgotten that the glory of words is not to be expansive, and include lots of things in them, but to be precise and narrow?.
Does evolution ~= AI have predictive power apart from doom? I have yet to see how natural selection helps me predict how any SGD algorithm works. It does not distinguish between Adam, AdamW. As far as I know it is irrelevant to Singular Learning Theory or NTK or anything else. It doesn't seem to come up when you try to look at NN biases. If it isn't an illuminating analogy anywhere else, why do we think the way it predicts doom to be true?
I agree that if you knew nothing about DL you'd be better off using that as an analogy to guide your predictions about DL than using an analogy to a car or a rock.
I do think a relatively small quantity of knowledge about DL screens off the usefulness of this analogy; that you'd be better off deferring to local knowledge about DL than to the analogy.
Or, what's more to the point -- I think you'd better defer to an analogy to brains than to evolution, because brains are more like DL than evolution is.
Combining some of yours and Habryka's comments, which seem similar.
It's true that the structure of the solution is discovered and complex -- but the ontology of the solution for DL (at least in currently used architectures) is quite opinionated towards shallow circuits with relatively few serial ops. This is different than the bias for evolution, which is fine with a mutation that leads to 10^7 serial ops if it's metabolic costs are low. So the resemblance seems shallow other than "solutions can be complex." I think to the degree that you defer to this belief rather than more specific beliefs about the inductive biases of DL you're probably just wrong.
As far as I know optimal learning rate for most architectures is scheduled, and decreases over time, which is not a feature of evolution so far as I am aware? Again the local knowledge is what you should defer to.
Is this a prediction that a cyclic learning rate -- that goes up and down -- will work out better than a decreasing one? If so, that seems false, as far as I know.
As far as I know grokking is a non-central example of how DL works, and in evolution punctuated equilibrium is a result of the non-i.i.d. nature of the task, which is again a different underlying mechanism from DL. If apply DL on non-i.i.d problems then you don't get grokking, you just get a broken solution. This seems to round off to, "Sometimes things change faster than others," which is certainly true but not predictively useful, or in any event not a prediction that you couldn't get from other places.
Like, leaving these to the side -- I think the ability to post-hoc fit something is questionable evidence that it has useful predictive power. I think the ability to actually predict something else means that it has useful predictive power.
Again, let's take "the brain" as an example of something to which you could analogize DL.
There are multiple times that people have cited the brain as an inspiration for a feature in current neural nets or RL. CNNS, obviously; the hippocampus and experience replay; randomization for adversarial robustness. You can match up interventions that cause learning deficiencies in brains to similar deficiencies in neural networks. There are verifiable, non-post hoc examples of brains being useful for understanding DL.
As far as I know -- you can tell me if there are contrary examples -- there are obviously more cases where inspiration from the brain advanced DL or contributed to DL understanding than inspiration from evolution. (I'm aware of zero, but there could be some.) Therefore it seems much more reasonable to analogize from the brain to DL, and to defer to it as your model.
I think in many cases it's a bad idea to analogize from the brain to DL! They're quite different systems.
But they're more similar than evolution and DL, and if you'd not trust the brain to guide your analogical a-theoretic low-confidence inferences about DL, then it makes more sense to not trust evolution for the same.