I talk here about how a mathematician mindset can be useful for AI alignment. But first, a puzzle:

Given , what is the least number such that for , the base representation of consists entirely of 0s and 1s?

If you want to think about it yourself, stop reading.

For =2, =2.

For =3, =3.

For =4, =4.

For =5, =82,000.

Indeed, 82,000 is 10100000001010000 in binary, 11011111001 in ternary, 110001100 in base 4, and 10111000 in base 5.

What about when =6?

So, a mathematician might tell you that this is an open problem. It is not known if there is any which consists of 0s and 1s in bases 2 through 6.

A scientist, on the other hand, might just tell you that clearly no such number exists. There are numbers that consist of 0s and 1s in base 6. Each of these has roughly digits in base 5, and assuming things are roughly evenly distributed, each of these digits is a 0 or a 1 with "probability" . The "probability" that there is any number of length that has the property is thus less than . This means that as you increase , the "probability" that you find a number with the property drops off exponentially, and this is not even considering bases 3 and 4. Also, we have checked all numbers up to 2000 digits. No number with this property exists.

Who is right?

Well, they are both right. If you want to have fun playing games with proofs, you can consider it an open problem and try to prove it. If you want to get the right answer, just listen to the scientist. If you have to choose between destroying the world with a 1% probability and destroying the world if a number greater than 2 which consists of 0s and 1s in bases 2 through 6 exists, go with the latter.

It is tempting to say that we might be in a situation similar to this. We need to figure out how to make safe AI, and we maybe don't have that much time. Maybe we need to run experiments, and figure out what is true about what we should do and not waste our time with math. Then why are the folks at MIRI doing all this pure math stuff, and why does CHAI talk about "proofs" of desired AI properties? It would seem that if the end of the world is at stake, we need scientists, not mathematicians.

I would agree with the above sentiment if we were averting an astroid, or a plague, or global warming, but I think it fails to apply to AI alignment. This is because optimization amplifies things.

As a simple example of optimization, let for be i.i.d. random numbers which are normally distributed with mean 0 and standard deviation 1. If I choose an at random, the probability that is greater than 4 is like 0.006%. However, if I optimize, and choose the greatest , the probability that it is greater that 4 is very close to 100%. This is the kind of thing that optimization does. It searches through a bunch of options, and takes extreme ones. This has the effect of making things that would be very small probabilities much larger.

Optimization also leads to very steep phase shifts, because it can send something on one side of a threshold to one extreme, and send things on the other side of a threshold to another extreme. Let for be i.i.d. random numbers that are uniform in the unit interval. If you look at the first 10 numbers and take the one that is furthest away from .499, the distribution over numbers will be bimodal peaks near 0 and 1. If you take the one that is furthest away from .501, you will get a very similar distribution. Now instead consider what happens if you look at all numbers and take the one that is furthest from .499. You will get a distribution that is almost certainly 1. On the other hand, the one that is furthest from .501 will be almost certainly 0. As you slightly change the optimization target, the result of a weak optimization might not change much, but the result of a strong one can change things drastically.

As a very rough approximation, a scientist is good at telling the difference between probability 0.01% and probability 99.99%, while the mathematician is good at telling the difference between 99.99% and 100%. Similarly, the scientist is good at telling if , while the mathematician is good at telling if when you already know that .

If you only want to get an approximately correct answer almost surely, the absence of strong optimization pressure makes the mathematician skills much less useful. However strong optimization pressure amplifies and creates discontinuities, which creates the necessity for a mathematician level of precision even to achieve approximate correctness in practice.

Notes:

1) I am not just saying that adversarial optimization makes small probabilities of failure large. I am saying that in general any optimization at all messes with small probabilities and errors drastically.

2) I am not saying that we don't need scientists. I am saying that we don't just need scientists, and I am saying that scientists should pay some attention to the mathematician mindset. There is a lot to be gained from getting your hands dirty in experiments.

3) I am not saying that we should only be satisfied if we achieve certainty that an AI system will be safe. That's an impossibly high standard. I am saying that we should aim for a deep formal understanding of what is going on, more like the "fully reduced" understanding we have of steam engines or rockets.

One of the main explanations of the AI alignment problem I link people to.

I think the simple mathematical models here are very helpful in pointing to some intuitions about being confident systems will work even with major optimisation pressure applied, and why optimisation power makes things weird. I would like to see other researchers in alignment review this post, because I don't fully trust my taste on posts like these.

I don't like the intro to the post. I feel like the example Scott gives makes the opposite of the point he wants it to make. Either a number with the given property exists or not. If such a number doesn't exist, creating a superintelligence won't change that fact. Given talk I've heard around the near certainty of AI doom, betting the human race on the nonexistence of a number like this looks pretty attractive by comparison -- and it's plausible there are AI alignment bets we could make that are analogous to this bet.

Yeah, Edge instantiation makes a similar point.

I think this is similar to Security Mindset, so you might want to think about this post in relation to that.

I don't think so, or if it is, than to a version of "security mindset" by Eliezer Yudkowsky, not a version by Bruce Schneier.

Very roughly speaking, security mindset is about differences between probabilities 99,99% and 1-10^(-16). From a mathematical perspective the difference between 1-10^(-16) and 1 is still more similar to the difference between 1-10^(-4) and 1.

Notable feature anybody who seriously studies security learns quickly is, it is in practice impossible to proof the security of anything useful except OTP. The whole rest of security usually reduces to physics and economy.

Note: I am not saying that we don't need mathematicians. We absolutely should try to get to that level of precision.

At the same time, mathematical way of thinking is in some sense fragile: a proof is ether correct or not. A proof which is "almost correct" is not worth very much.

When Scott says "mathematician mindset can be useful for AI alignment", I take it that your interpretation is "we should try to make sure that when we build AGI, we can prove that our system is safe/robust/secure", whereas I think the intended interpretation is "we should try to make sure that when we build AGI, we have a deep formal understanding of how this kind of system works at all so that we're not flying blind". Similar to how we understand the mathematics of how rockets work in principle, and if we found a way to build a rocket without that understanding, it's very unlikely we'd be able to achieve much confidence in the system's behavior.

I think the end of this excerpt from a 2000 Bruce Schneier piece is assuming something like this, though I don't know that Schneier would agree with Eliezer and Scott fully:

Cf. this thing I said a few months ago:

Scott can correct me if I'm misunderstanding his post (e.g., rounding it off too much to what's already in my head).

I think "what should be done" is generally different question that "what kind of mindsets there are" and I would prefer to disentangle them.

My claims about mindsets roughly are

As I understand it (correct me if I'm wrong), your main claim roughly is "we should have a deep understanding how these systems works at all".

I don't think there is much disagreement on that.

But please note that Scott's post in several places makes explicit distinction between the kind of understanding achieved in mathematics, and in science. The understanding we have how rockets work is pretty much on the physics side of this - e.g. we know we can disregard gravitational waves, radiation pressure, and violations of CP symmetry.

To me, this seems different from mathematics, where it would be somewhat strange to say something like "we basically understand what functions and derivatives are ... you can just disregard cases like the Weierstrass function".

(comment to mods: I would actually enjoy a setting allowing me to not see the karma system at all, the feedback it is giving me is "write things which people would upvote" vs. "write things which are most useful - were I'm unsure, see some flaws,...". )