Drake Thomas

Interested in math puzzles, fermi estimation, strange facts about the world, toy models of weird scenarios, unusual social technologies, and deep dives into the details of random phenomena.

Currently doing independent alignment research of assorted flavors.

I'm not sure what you mean formally by these assumptions, but I don't think we're making all of them. Certainly we aren't assuming things are normally distributed - the post is in large part about how things change when we *stop* assuming normality! I also don't think we're making any assumptions with respect to additivity; is more of a notational or definitional choice, though as we've noted in the post it's a framing that one could think doesn't carve reality at the joints. (Perhaps you meant something different by additivity, though - feel free to clarify if I've misunderstood.)

Independence is absolutely a strong assumption here, and I'm interested in further explorations of how things play out in different non-independent regimes - in particular we'd be excited about theorems that could classify these dynamics under a moderately large space of non-independent distributions. But I do expect that there are pretty similar-looking results where the independence assumption is substantially relaxed. If that's false, that would be interesting!

An example of the sort of strengthening I wouldn't be surprised to see is something like "If V is not too badly behaved in the following ways, and for all v∈R we have [some light-tailedness condition] on the conditional distribution (X|V=v), then catastrophic Goodhart doesn't happen." This seems relaxed enough that you could actually encounter it in practice.