Some minor feedback points: Just from reading the abstract and intro, this could be read as a non-sequitur: "It limits our ability to mitigate short-term harms from NLP deployments". Also, calling something a "short-term" problem doesn't seem necessary and it may sound like you think the problem is not very important.
OpenAI's work speeds up progress, but in a way that's likely smooth progress later on. If you spend as much compute as possible now, you reduce potential surprises in the future.
On 2): Being overparameterized doesn't mean you fit all your training data. It just means that you could fit it with enough optimization. Perhaps the existence of some Savant people shows that the brain could memorize way more than it does.
On 3): The number of our synaptic weights is stupendous too - about 30000 for every second in our life.
On 4): You can underfit at the evolution level and still overparameterize at the individual level.
Overall you convinced me that underparameterization is less likely though. Especially on your definition of overparameterization, which is relevant for double descent.
Why do you think that humans are, and powerful AI systems will be, severely underparameterized?
Also interesting to see that all of these groups were able to coordinate to the disadvantage of less coordinates groups, but not able to reach peace among themselves.
One explanation might be that the more coordinated groups also have harder coordination problems to solve because their world is bigger and more complicated. Might be the same with AI?
If X is "number of paperclips" and Y is something arbitrary that nobody optimizes, such as the ratio of number of bicycles on the moon to flying horses, optimizing X should be equally likely to increase or decrease Y in expectation. Otherwise "1-Y" would go in the opposite direction which can't be true by symmetry. But if Y is something like "number of happy people", Y will probably decrease because the world is already set up to keep Y up and a misaligned agent could disturb that state.
I should've specified that the strong version is "Y decreases relative to a world where neither of X nor Y are being optimized". Am I right that this version is not true?
Thanks for writing this! It always felt like a blind spot to me that we only have Goodhart's law that says "if X is a proxy for Y and you optimize X, the correlation breaks" but we really mean a stronger version: "if you optimize X, Y will actively decrease". Your paper clarifies that what we actually mean is an intermediate version: "if you optimize X, it becomes a harder to optimize Y". My conclusion would be that the intermediate version is true but the strong version false then. Would you say that's an accurate summary?
Costs don't really grow linearly with model size because utilization goes down as you spread a model across many GPUs. I. e. aggregate memory requirements grow superlinearly. Relatedly, model sizes increased <100x while compute increased 300000x on OpenAI's data set. That's been updating my views a bit recently.
People are trying to solve this with things like GPipe, but I don't know yet if there can be an approach that scales to many more TPUs than what they tried (8). Communication would be the next bottleneck.
(also x-posted from https://arbital.com/p/goodharts_curse/#subpage-8s5)
Another, speculative point: If V and U were my utility function and my friend's, my intuition is that an agent that optimizes the wrong function would act more robustly. If true, this may support the theory that Goodhart's curse for AI alignment would be to a large extent a problem of defending against adversarial examples by learning robust features similar to human ones. Namely, the robust response may be because me and my friend have learned similar robust, high-level features; we just give them different importance.