The SIA population update can be surprisingly small

Stuart_Armstrong

With many thanks to Damon Binder, and the spirited conversations that lead to this post, and to Anders Sandberg.

People often think that the self-indication assumption (SIA) implies a huge number of alien species, millions of times more than otherwise. Thought experiments like the presumptuous philosopher seem to suggest this.

But here I'll show that, in many cases, updating on SIA doesn't change the expected number of alien species much. It all depends on the prior, and there are many reasonable priors for which the SIA update does nothing more than double the probability of life in the universe^[1].

This can be the case even if the prior says that life is very unlikely! We can have a situation where we are astounded, flabbergasted, and disbelieving about our own existence - "how could we exist, how can this beeeeee?!?!?!?" - and still not update much - "well, life is still pretty unlikely elsewhere, I suppose".

In the one situation where we have an empirical distribution, the "Dissolving the Fermi Paradox" paper, the effect of the SIA anthropics update is to multiply the expected civilization per planet by seven. Not seven orders of magnitude - just seven.

The formula

Let be the probability of advanced space-faring life evolving on a given planet; for the moment, ignore issues of life expanding to other planets from their one point of origin. Let $f$ be the prior distribution of $ρ$ , with mean $μ$ and variance $σ^{2}$ . This means that, if we visit another planet, our probability of finding life is $μ$ .

On this planet, we exist^[2]. Then if we update on our existence we get a new distribution $f^{'}$ ; this distribution will have mean $μ^{'}$ :

$μ^{'} = μ (1 + \frac{σ^{2}}{μ^{2}}) .$

To see a proof of this result, look at this footnote^[3].

Define $M_{μ, σ^{2}} = 1 + σ^{2} / μ^{2}$ to be this multiplicative factor between $μ$ and $μ^{'}$ ; we'll show that there are many reasonable situations where $M_{μ, σ^{2}}$ is surprisingly low: think $2$ to $100$ , rather than in the millions or billions.

Beta distributions I

Let's start with the most uninformative prior of all: a uniform prior over $[0, 1]$ . The expectation of $ρ$ is $\int_{0}^{1} ρ d ρ = 1 / 2$ , so, without any other information, we expect a planet to have life with $50 %$ probability. The variance is $σ^{2} = 1 / 12$ .

Thus if we update on our existence on Earth, we get the posterior $f^{'} (ρ) = 2 ρ$ ; the mean of this is $2 / 3$ (either direct calculation or using $M_{1 / 2, 1 / 12} = 1 + 4 / 12 = 4 / 3$ ).

Even though this change in expectation is multiplicatively small, it does seem that the uniform prior and the $f^{'} (ρ)$ are very different, with $f^{'} (ρ)$ heavily skewed to the right. But now consider what happens if we look at Mars and notice that it hasn't got life. The probability of no life, given $ρ$ , is $1 - ρ$ . Updating on this and renormalising gives a posterior $6 ρ (1 - ρ)$ :

The expectation of $6 ρ (1 - ρ)$ , symmetric around $1 / 2$ , is of course $1 / 2$ . Thus one extra observation (that Mars is dead) has undone, in expectation, all the anthropic impact of our own existence.

This is an example of a beta distribution for $α = 2$ and $β = 2$ (yes, beta distributions have a parameter called $β$ and another one that's $α$ ; just deal with it). Indeed, the uniform prior is also a beta distribution (with $α = β = 1$ ) as is the anthropic updated version $2 ρ$ (which has $α = 2$ , $β = 1$ ).

The update rule for beta distributions is that a positive observation (ie life) increases $α$ by $1$ , and a negative observation (a dead planet) increases $β$ by $1$ . The mean of an updated beta distribution is a generalised version of Laplace's law of succession: if our prior is a beta distribution with parameters $α$ and $β$ , and we've had $m$ positive observations and $n$ negative ones, then the mean of the posterior is:

$\frac{α + m}{α + β + m + n} .$

Suppose now that we have observed $n$ dead planets, but no life, and that we haven't done an anthropic update yet, then we have a probability of life of $α / (α + β + n)$ . Upon adding the anthropic update, this shifts to $(α + 1) / (α + β + n + 1)$ , meaning that the multiplicative factor is at most $(α + 1) / α$ . If we started with the uniform prior with its $α = 1$ , this multiplies the probability of life by at most $2$ . In a later section, we'll look at $α < 1$ .

High prior probability is not required for weak anthropic update

The uniform prior has $α = β = 1$ and starts at expectation $1 / 2$ . But we can set $α = 1$ and a much higher $β$ , which skews the distribution to the left; for example, for $β = 2$ , $3$ , and $10$ :

Even though these priors are skewed to the left, and have lower prior probabilities of life ( $1 / 3$ , $1 / 4$ , and $1 / 11$ ), the anthropic update has a factor $M_{μ, σ^{2}}$ that is less than $2$ .

Also note that if we scale the prior $f$ by a small $ϵ$ , so replace $f (ρ)$ on the range $[0, 1]$ with $f (ρ / ϵ) / ϵ$ on the range $[0, ϵ]$ , then $μ$ is multiplied by $ϵ$ and $σ^{2}$ is multiplied by $ϵ^{2}$ . Thus $M_{μ, ϵ}$ is unchanged. Here, for example, is the uniform distribution, scaled down by $ϵ = 1$ , $ϵ = 1 / 3$ , and $ϵ = 1 / 20$ :

All of these will have the same $M_{μ, σ^{2}}$ (which is $4 / 3$ , just as for the uniform distribution). And, of course, doing the same scaling with the various beta distributions we've seen up until now will also keep $M_{μ, σ^{2}}$ constant.

Thus there are a lot of distributions with very low $μ$ (ie very low prior probability of life) but an $M_{μ, σ^{2}}$ that's less than $2$ (ie the anthropic update is less than a doubling of the probability of life).

Beta distributions II and log-normals

The best-case scenario for $M_{μ, σ^{2}}$ is if $f$ assigns probability $1$ to $ρ = μ$ . In that case, $σ^{2} = 0$ and $M = 1$ : the anthropic update changes nothing.

Conversely, the worse-case scenario for $M_{μ, σ^{2}}$ is if $f$ only allows $ρ = 0$ and $ρ = 1$ . In that case, $f$ assigns probability $μ$ to $1$ and $1 - μ$ to $0$ , for a mean of $μ$ and a variance of $σ^{2} = μ - μ^{2}$ , and a multiplicative factor of $M_{μ, σ^{2}} = 1 / μ$ . In this case, after anthropic update, $f^{'}$ assigns certainty to $ρ = 1$ (since any life at all, given this $f$ , means life on all planets).

But there are also more reasonable priors with large $M_{μ, σ^{2}}$ . We've already seen some, implicitly, above: the beta distributions with $α < 1$ . In that case, $M_{μ, σ^{2}}$ is bounded by $(α + 1) / α$ . If $α = 3 / 4$ and $β = 1$ , for instance, this corresponds to the (unbounded) distribution $f (ρ) = (3 / 4) ρ^{- 1 / 4}$ ; the multiplicative factor is below $7 / 3$ , which is slightly above $2$ . But as $α$ declines, the multiplicative factor can go up surprisingly fast; at $α = 1 / 2$ it is $3$ , at $α = 1 / 4$ it is $5$ :

In general, for $α = 1 / n$ , the multiplicative factor is bounded by $n + 1$ . This gets arbitrarily large as $α \to 0$ . Though $α = 0$ itself corresponds to the improper prior $f (ρ) = 1 / ρ$ , whose integral diverges. On a log scale, this corresponds to the log-uniform distribution, which is roughly what you get if you assume "we need $N$ steps, each of probability $p$ , to get life; let's put a uniform prior over the possible $N$ s".

It's not clear why one might want to choose $α = 1 / 10^{20}$ for a prior, but there is a class of prior that is much more natural: the log-normal distributions. These are random variables $X$ such that $log (X)$ is normally distributed.

If we choose $log (X)$ to have a mean that is highly negative (and a variance that isn't too large), then we can mostly ignore the fact that $X$ takes values above $1$ , and treat it as a prior distribution for $ρ$ . The mean and variance of the log-normal distributions can be explicitly defined, thus giving the multiplications factor as:

$M_{μ, σ^{2}} = exp {¯ ¯ ¯ σ}^{2} .$

Here, ${¯ ¯ ¯ σ}^{2}$ is the variance of the normal distribution $log (X)$ . This ${¯ ¯ ¯ σ}^{2}$ might be large, as it denotes (roughly) "we need $N$ steps, each of probability $p$ , to get life; let's put a uniform-ish prior over a range of possible $N$ s". Unlike $1 / ρ$ , this is a proper prior, and a plausible one; therefore there are plausible priors with very large $M_{μ, σ^{2}}$ . The log normal is quite likely to appear, as it is the approximate limit of multiplying together a host of different independent parameters.

Multiplication law

Do you know what's more likely to be useful than "the approximate limit of multiplying together a host of different independent parameters"? Actually multiplying together independent parameters.

The famous Drake equation is:

$R_{*} \cdot f_{p} \cdot n_{e} \cdot f_{l} \cdot f_{i} \cdot f_{c} \cdot L .$

Here $R^{*}$ is the number of stars in our galaxy, $f_{p}$ the fraction of those with planets, $n_{e}$ the number of planets that can support life per star that has planets, $f_{l}$ the fraction of those that develop life, $f_{i}$ the fraction of those that develop intelligent life, $f_{c}$ the fraction of those that release detectable signs of their existence, and $L$ is the length of time those civilizations endure as detectable.

Then the proportion of advanced civilizations per planet is $q f_{l} f_{i}$ , where $q$ is the proportion of life-supporting planets among all planets. To compute the $M$ of this distribution, we have the highly useful result (the proof is in this footnote^[4]):

Let $X_{i}$ be independent random variables with multiplicative factors $M_{i}$ , and let $M$ be the multiplicative factor of $X = X_{1} \cdot X_{2} \cdot \dots \cdot X_{n}$ . Then $M = \prod_{i} M_{i}$ - the total $M$ is the product of the individual $M_{i}$ .

The paper "dissolving the Fermi paradox" gives estimated distributions for all the terms in the Drake equation. The $q$ , which doesn't appear in that paper, is a constant, so has $M_{q} = 1$ . The $f_{i}$ has a log-uniform distribution from $0.001$ to $1$ ; the $M$ can be computed from the mean and variance of such distributions, so $M_{f_{i}} = log (1 / 0.001) \frac{1 - {0.001}^{2}}{2 (1 - 0.001)^{2}} \approx 3.5$ .

The $f_{l}$ term is more complicated; it is distributed like $g (X) = 1 - e^{- e^{X \cdot 50 log (10)}}$ where $X$ is a standard normal distribution. Fortunately, we can estimate its mean and variance without having to figure out its distribution, by numerical integration of $g (x)$ and $g (x^{2})$ on the normal distribution. This gives $μ \approx 0.5$ , $σ^{2} \approx 0.25$ and $M \approx 2$ . The overall the multiplicative effect of anthropic update is:

$M_{planet} \approx 7.$

What if we considered the proportion of advanced civilization per star, rather than per planet? Then we can drop the $q$ term and add in $f_{p}$ and $n_{e}$ . Those are both estimated to be distributed as log-uniform on $[0.1, 1]$ ; for a total $M$ of

$M_{star} \approx 14.$

Why is the $M$ higher for civilizations per star than civilizations per planet? That's because when we update on our existence, we increase the proportion of civilizations per planet, but we also update the proportion of planets per star - both of these can make life more likely. The $M_{star}$ incorporates both effects, so is strictly higher than $M_{planet}$ .

We can do the same by considering the number of civilizations per galaxy; then we have to incorporate $R_{*}$ as well. This is log-uniform on $[1, 100]$ , giving:

$M_{galaxy} \approx 32.$

What about if we include the Fermi observation (the fact that we don't see anything in our galaxy)? The "dissolving the Fermi paradox" paper shows there are multiple different ways of including this update, depending on how we parse out "not seeing anything" and how easy it is for civilizations to expand.

I did a crude estimate here by taking the Fermi observation to mean "the proportion of civilizations per galaxy must be less than one". Then I did a Monte-Carlo simulation, ignoring all results above $0$ on the log scale:

From this, I got an estimated mean of $0.027$ , variance of $0.014$ , and a total multiplier of:

$M_{galaxy, Fermi} \approx 21.$

With the Fermi observation and the anthropic update combined, we expect $0.56$ civilizations per galaxy.

Limitations of the multiplier

Low multiplier, strong effects

It's important to note that the anthropic update can be very strong, without changing the expected population much. So a low $M_{μ, σ^{2}}$ doesn't necessary mean a low impact.

Consider for instance the presumptuous philosopher, slightly modified to use planetary population densities. Thus theory $T_{1}$ predicts $ρ = 1 / 10^{12}$ (one in a trillion) and $T_{2}$ predicts $ρ = 1$ ; we put initial probabilities $1 / 2$ on both theories.

As Nick Bostrom noted, the SIA update pushes $T_{2}$ to being a trillion times more probable than $T_{1}$ ; a postiori, $T_{2}$ is roughly a certainty (the actual probability is $10^{12} / (10^{12} + 1)$ ).

However, the expected population goes from roughly $1 / 2$ (the average of $1 / 10^{12}$ and $1$ ) to roughly $1$ (since a postiori $T_{2}$ is almost certain). This gives a $M_{μ, σ^{2}}$ of roughly $2$ . So, despite the strong update towards $T_{2}$ , the actual population update is small - and, conversely, despite the actual population update being small, we have a strong update towards $T_{2}$ .

Combining multiple theories

In the previous post, note that that both $T_{1}$ and $T_{2}$ were point estimates: they posit a constant $ρ$ . So they have a variance of zero, and hence a $M_{μ, σ^{2}}$ of $1$ . But $T_{2}$ has a much stronger anthropic update. Thus we can't use their $M_{μ, σ^{2}}$ to compare the anthropic effects on different theories.

We also can't relate the individual $M$ s to that of a combined theory. As we've seen, $T_{1}$ and $T_{2}$ have $M$ s of $1$ , but the combined theory $(1 / 2) T_{1} + (1 / 2) T_{2}$ has an $M$ of roughly $2$ . But we can play around with the relative initial weight of $T_{1}$ and $T_{2}$ to get other $M$ s.

If we started with odds $10^{12} : 1$ on $T_{1}$ vs $T_{2}$ , then this has a mean $ρ$ of roughly $10^{- 12}$ ; the anthropic update sends it to $1 : 1$ odds, with a mean of roughly $1 / 2$ . So this combined theory has an $M$ of roughly $10^{12} / 2$ , half a trillion.

But, conversely, if we started with odds $1 : 10^{12}$ on $T_{1}$ vs $T_{2}$ , then we have an initial mean of $ρ$ of roughly one; its anthropic update is odds of $1 : 10^{24}$ , also with a mean of roughly one. So this combined theory has an $M$ of roughly $1$ .

There is a weak relation between $M$ and the $M_{i}$ of the various $T_{i}$ . Let $M_{i}$ be the multiplier of $T_{i}$ has a multiplier of $M_{i}$ ; we can reorder the $T_{i}$ so that $M_{i} \leq M_{j}$ for $i \leq j$ . Let $T$ be a combined theory that assigns probability $p_{i}$ to $T_{i}$ .

For all ${p_{i}}$ , $M \geq {min}_{i} (M_{i})$ .
For all $ϵ$ , there exists ${p_{i}}$ with all $p_{i} > 0$ , so that $M < {min}_{i} (M_{1}) + ϵ$ .

So, the minimum value of the $M_{i}$ is a lower bound on $M$ , and we can get arbitrarily close to that bound. See the proof in this footnote^[5].

As we'll see, the population update is small even in the presumptuous philosopher experiment itself. ↩︎
Citation partially needed: I'm ignoring Boltzmann brains and simulations and similar ideas. ↩︎
Given a fixed $ρ$ , the probability of observing life on our own planet is exactly $ρ$ . So Bayes's theorem implies that $f^{'} (ρ) \propto ρ f (ρ)$ . With the full normalisation, this is

$f^{'} (ρ) = \frac{ρ f (ρ)}{\int_{0}^{1} ρ f (ρ) d ρ} .$

If we want to get the mean $μ^{'}$ of this distribution, we further multiply by $ρ$ and integrate:

$μ^{'} = E_{f^{'}} (ρ) = \int_{0}^{1} \frac{ρ^{2} f (ρ)}{\int_{0}^{1} ρ f (ρ) d ρ} d ρ = \frac{\int_{0}^{1} ρ^{2} f (ρ) d ρ}{\int_{0}^{1} ρ f (ρ) d ρ} .$

Let's multiply this by $1 = 1 / 1 = (\int_{0}^{1} f (ρ) d ρ) / (\int_{0}^{1} f (ρ) d ρ)$ and regroup the terms:

$μ^{'} = \frac{\int_{0}^{1} ρ^{2} f (ρ) d ρ}{\int_{0}^{1} f (ρ) d ρ} \cdot \frac{\int_{0}^{1} f (ρ) d ρ}{\int_{0}^{1} ρ f (ρ) d ρ} .$

Thus $μ^{'} =$ $E_{f} (ρ^{2}) / E_{f} (ρ) =$ $(σ^{2} + μ^{2}) / μ =$ $μ (1 + σ^{2} / μ^{2})$ , using the fact that the variance is the expectation of $ρ^{2}$ minus the square of the expectation of $ρ$ . ↩︎
I adapted the proof in this post.

So, let $X_{i}$ be independent random variables with means $μ_{i}$ and variances $σ_{i}^{2}$ . Let $X = \prod_{i} X_{i}$ , which has mean $μ$ and variance $σ^{2}$ . Due to the independence of the $X_{i}$ , the expectations of their products are the product of their expectations. Note that $X_{i}^{2}$ and $X_{j}^{2}$ are also independent if $i \neq j$ . Then we have:

$\begin{matrix} \prod_{i} M_{μ_{i}, σ_{i}^{2}} & = \prod_{i} (1 + \frac{σ_{i}^{2}}{μ_{i}^{2}}) = \prod_{i} (\frac{μ_{i}^{2} + σ_{i}^{2}}{μ_{i}^{2}}) = \prod_{i} (\frac{E (X_{i}^{2})}{μ_{i}^{2}}) = \frac{\prod_{i} (E (X_{i}^{2}))}{\prod_{i} E (X_{i})^{2}} = \frac{E (X^{2})}{E (X)^{2}} = \frac{μ^{2} + σ^{2}}{μ^{2}} = 1 + \frac{σ^{2}}{μ^{2}} = M_{μ, σ^{2}} . \end{matrix}$ ↩︎
Let ${f_{i}}_{1 \leq i \leq n}$ be probability distributions on $ρ$ , with mean $μ_{i}$ , variance $σ_{i}^{2}$ , expectation squared $s_{i} = E_{f_{i}} (ρ^{2}) = σ_{i}^{2} + μ_{i}^{2}$ , and $M_{i} = s_{i} / μ_{i}^{2}$ . Without loss of generality, reorder the $f_{i}$ so that $M_{i} \leq M_{j}$ for $i < j$ .

Let $f$ be the probability distribution $f = p_{1} f_{1} + \dots p_{n} f_{n}$ , with associated multiplier $M$ . Without loss of generality, assume $M_{i} \leq M_{j}$ for $i < j$ . Then we'll show that $M \geq M_{1}$ .

We'll first show this in the special case where $n = 2$ and $M_{1} = M_{2}$ , then generalise to the general case, as is appropriate for a generalisation. If $s_{1} / μ_{1}^{2} = M_{1} = M_{2} = s_{2} / μ_{2}^{2}$ , then, since all terms are non-negative, there exists an $α$ such that $s_{1} = α^{2} s_{2}$ while $μ_{1} = α μ_{2}$ . Then for any given $p = p_{1}$ , the $M$ of $f$ is:

$M (p) = \frac{p s_{1} + (1 - p) s_{2}}{(p μ_{1} + (1 - p) μ_{2})^{2}} = \frac{p s_{1} + (1 - p) α^{2} s_{1}}{(p μ_{1} + (1 - p) α μ_{1})^{2}} = M_{1} \frac{1 (p) + α^{2} (1 - p)}{(1 (p) + α (1 - p))^{2}} .$

The function $x \to x^{2}$ is convex, so, interpolating between the values $x = 1$ and $x = α$ , we know that for all $0 \leq p \leq 1$ , the term $(1 (p) + α (1 - p))^{2}$ must be lower than $1^{2} (p) + α^{2} (1 - p)$ . Therefore $(1 (p) + α^{2} (1 - p)) / (1 (p) + α (1 - p))^{2}$ is at most $1$ , and $M (p) \leq M_{1}$ . This shows the result for $n = 2$ if $M_{1} = M_{2}$ .

Now assume that $M_{2} > M_{1}$ , so that $s_{1} / μ_{1}^{2} < s_{2} / μ_{2}^{2}$ . Then replace $s_{2}$ with $s_{2}^{'}$ , which is lower than $s_{2}$ , so that $s_{1} / μ_{1}^{2} = s_{2}^{'} / μ_{2}^{2}$ . If we define $M^{'} (p)$ as the expression for $M (p)$ with $s_2' substituting for $s_{2}$ , we know that $M^{'} (p) \leq M (p)$ , since $s_{2}^{'} < s_{2}$ . Then the previous result shows that $M^{'} (p) \geq M_{1}$ , thus $M (p) \geq M_{1}$ too.

To show the result for larger $n$ , we'll induct on $n$ . For $n = 1$ the result is a tautology, $M_{1} \leq M_{1}$ , and we've shown the result for $n = 2$ . Assume the result is true for $n - 1$ , and then notice that $f = p_{1} f_{1} + \dots p_{n} f_{n}$ can be re-written as $f = p_{1} f_{1} + (1 - p_{1}) f^{'}$ , where $f^{'} = (p_{2}^{'} f_{2} + \dots p_{n}^{'} f_{n})$ for $p_{i}^{'} = p_{n} / (1 - p_{1})$ . Then, by the induction hypothesis, if $M^{'}$ is the $M$ of $f^{'}$ , then $M^{'} \geq M_{2}$ . Then applying the result for $n = 2$ between $f_{1}$ and $f^{'}$ , gives $M \leq min (M_{1}, M^{'})$ . However, since $M_{1} \leq M_{2}$ and $M^{'} \geq M_{2}$ , we know that $min (M_{1}, M^{'}) = M_{1}$ , proving the general result.

To show $M$ can get arbitrarily close to $M_{1}$ , simply note that $M$ is continuous in the ${p_{i}}$ , define $p_{1} = 1 - ϵ$ , $p_{i} = ϵ / (n - 1)$ for $i > 1$ , and let $ϵ$ tend to $0$ . ↩︎

AI ALIGNMENT FORUM
AF