This is a more technical followup to the last post, putting precise bounds on when regressional Goodhart leads to failure or not. We'll first show conditions under which optimization for a proxy fails, and then some conditions under which it succeeds. (The second proof will be substantially easier.)

Related work

In addition to the related work sections of the previous post, this post makes reference to the textbook An Introduction to Heavy-Tailed and Subexponential Distributions, by Foss et al. Many similar results about random variables are present in the textbook, though we haven't seen this posts's results elsewhere in the literature before. We mostly adopt their notation here, and cite a few helpful lemmas.

Main result: Conditions for catastrophic Goodhart

Suppose that X and V are independent real-valued random variables. We're going to show, roughly, that if

X is subexponential (a slightly stronger property than being heavy-tailed).

V has lighter tails than X by more than a linear factor, meaning that the ratio of the tails of V and the tails of X grows superlinearly.^{[1]}

then limt→∞E[V|X+V≥t]=E[V].

Less formally, we're saying something like "if it requires relatively little selection pressure on X to get more of X and asymptotically more selection pressure on V to get more of V, then applying very strong optimization towards X+V will not get you even a little bit of optimization towards V - all the optimization power will go towards X."

Proof sketch and intuitions

The conditional expectation E[V|X+V>t] is given by ∫∞−∞vfV(v)Pr(X>t−v)∫∞−∞fV(v)Pr(X>t−v),^{[2]} and we divide the integral in the numerator into 4 regions, showing that each region's effect on the conditional expectation of V is similar to that of the corresponding region in the unconditional expectation E[V].

The regions are defined in terms of a slow-growing function h(t):R→R≥0 such that the fiddly bounds on different pieces of the proof work out. Roughly, we want it to go to infinity so that |V| is likely to be less than h(t) in the limit, but grow slowly enough that the shape of V's distribution within the interval [−h(t),h(t)] doesn't change much after conditioning.

In the table below, we abbreviate the condition X+V>t as c.

Region

Why its effect on E[V|c] is small

Explanation

r1=(−∞,−h(t)]

P[V∈r1|c] is too low

In this region, |V|>h(t) and X>t+h(t), both of which are unlikely.

r2=(−h(t),h(t))

E[V|V∈r2,c]≈E[V|V∈r2]

The tail distribution of X is too flat to change the shape of V's distribution within this region.

r3=[h(t),t−h(t)]

P[V∈r3|c] is low, and V<t.

There are increasing returns to each bit of optimization for X, so it's unlikely that both X and V have moderate values.^{[3]}

r4=(t−h(t),∞)

P[V∈r4|c] is too low

X is heavier-tailed than V, so the condition that V>t−h(t) is much less likely than X>t−h(t) in r2.

Here's a diagram showing the region boundaries at −h(t), h(t), and t−h(t) in an example where t=25 and h(t)=4, along with a negative log plot of the relevant distribution:

Note that up to a constant vertical shift of normalization, the green curve is the pointwise sum of the blue and orange curves.

Full proof

To be more precise, we're going to make the following definitions and assumptions:

Let fV(v) be the PDF of V at the value v. We assume for convenience that fV exists, is integrable, etc, though we suspect that this isn't necessary, and that one could work through a similar proof just referring to the tails of V. We won't make this assumption for X.

Let FX(x)=Pr(X≤x) and ¯FX(x)=Pr(X>x), similarly for FV and ¯FV.

Assume V has a finite mean: ∫∞−∞vfV(v)dv converges absolutely.

Formally, this means that limx→∞Pr(X1+X2>x)Pr(X>x)=2.

This happens roughly whenever X has tails that are heavier than e−cx for any c and is reasonably well-behaved; counterexamples to the claim "long-tailed implies subexponential" exist, but they're nontrivial to exhibit.

Examples of subexponential distributions include log-normal distributions, anything that decays like a power law, the Pareto distribution, and distributions with tails asymptotic to e−xa for any 0<a<1.

We require for V that its tail function is substantially lighter than X's, namely thatlimt→∞tp¯FV(t)¯FX(t)=0 for some p>1.^{[1]}

This implies that ¯FV(t)=O(¯FX(t)/t).

With all that out of the way, we can move on to the proof.

The unnormalized PDF of V conditioned on X+V≥t is given by fV(v)¯FX(t−v). Its expectation is given by ∫∞−∞vfV(v)¯FX(t−v)∫∞−∞fV(v)¯FX(t−v).

Meanwhile, the unconditional expectation of V is given by ∫∞−∞vfV(v).

We'd like to show that these two expectations are equal in the limit for large t. To do this, we'll introduce Q(v)=¯FX(t−v)¯FX(t). (More pedantically, this should really be Qt(v), which we'll occasionally use where it's helpful to remember that this is a function of t.)

For a given value of t, Q(v) is just a scaled version of ¯FX(t−v), so the conditional expectation of V is given by ∫∞−∞vfV(v)Q(v)∫∞−∞fV(v)Q(v). But because Q(0)=1, the numerator and denominator of this fraction are (for small v) close to the unconditional expectation and 1, respectively.

We'll aim to show that for all ϵ>0, we have for sufficiently large t that ∣∣∫∞−∞vfV(v)Qt(v)−∫∞−∞vfV(v)∣∣<ϵ and ∫∞−∞fV(v)Qt(v)∈[1−ϵ,1+ϵ], which implies (exercise) that the two expectations have limiting difference zero. But first we need some lemmas.

This is a more technical followup to the last post, putting precise bounds on when regressional Goodhart leads to failure or not. We'll first show conditions under which optimization for a proxy fails, and then some conditions under which it succeeds. (The second proof will be substantially easier.)

## Related work

In addition to the related work sections of the previous post, this post makes reference to the textbook

An Introduction to Heavy-Tailed and Subexponential Distributions, by Foss et al. Many similar results about random variables are present in the textbook, though we haven't seen this posts's results elsewhere in the literature before. We mostly adopt their notation here, and cite a few helpful lemmas.## Main result: Conditions for catastrophic Goodhart

Suppose that X and V are independent real-valued random variables. We're going to show, roughly, that if

^{[1]}then limt→∞E[V|X+V≥t]=E[V].

Less formally, we're saying something like "if it requires relatively little selection pressure on X to get more of X and asymptotically more selection pressure on V to get more of V, then applying very strong optimization towards X+V will not get you even a little bit of optimization towards V - all the optimization power will go towards X."

## Proof sketch and intuitions

The conditional expectation E[V|X+V>t] is given by ∫∞−∞vfV(v)Pr(X>t−v)∫∞−∞fV(v)Pr(X>t−v),

^{[2]}and we divide the integral in the numerator into 4 regions, showing that each region's effect on the conditional expectation of V is similar to that of the corresponding region in the unconditional expectation E[V].The regions are defined in terms of a slow-growing function h(t):R→R≥0 such that the fiddly bounds on different pieces of the proof work out. Roughly, we want it to go to infinity so that |V| is likely to be less than h(t) in the limit, but grow slowly enough that the shape of V's distribution within the interval [−h(t),h(t)] doesn't change much after conditioning.

In the table below, we abbreviate the condition X+V>t as c.

^{[3]}Here's a diagram showing the region boundaries at −h(t), h(t), and t−h(t) in an example where t=25 and h(t)=4, along with a negative log plot of the relevant distribution:

Note that up to a constant vertical shift of normalization, the green curve is the pointwise sum of the blue and orange curves.

## Full proof

To be more precise, we're going to make the following definitions and assumptions:

^{[1]}With all that out of the way, we can move on to the proof.

The unnormalized PDF of V conditioned on X+V≥t is given by fV(v)¯FX(t−v). Its expectation is given by ∫∞−∞vfV(v)¯FX(t−v)∫∞−∞fV(v)¯FX(t−v).

Meanwhile, the unconditional expectation of V is given by ∫∞−∞vfV(v).

We'd like to show that these two expectations are equal in the limit for large t. To do this, we'll introduce Q(v)=¯FX(t−v)¯FX(t). (More pedantically, this should really be Qt(v), which we'll occasionally use where it's helpful to remember that this is a function of t.)

For a given value of t, Q(v) is just a scaled version of ¯FX(t−v), so the conditional expectation of V is given by ∫∞−∞vfV(v)Q(v)∫∞−∞fV(v)Q(v). But because Q(0)=1, the numerator and denominator of this fraction are (for small v) close to the unconditional expectation and 1, respectively.

We'll aim to show that for all ϵ>0, we have for sufficiently large t that ∣∣∫∞−∞vfV(v)Qt(v)−∫∞−∞vfV(v)∣∣<ϵ and ∫∞−∞fV(v)Qt(v)∈[1−ϵ,1+ϵ], which implies (exercise) that the two expectations have limiting difference zero. But first we need some lemmas.

## Lemmas

Lemma 1:There is h(t) depending on FX such that: