Sammy Martin

Philosophy and Physics BSc, AI MSc at Edinburgh, starting a PhD at King's College London. Interested in metaethics, anthropics/general philosophy and technical AI Safety.

Sammy Martin's Comments

Modelling Continuous Progress

After reading your summary of the difference (maybe just a difference in emphasis) between 'Paul slow' vs 'continuous' takeoff, I did some further simulations. A low setting of d (highly continuous progress) doesn't give you a paul slow condition on its own, but it is relatively easy to replicate a situation like this:

There will be a complete 4 year interval in which world output doubles, before the first 1 year interval in which world output doubles. (Similarly, we’ll see an 8 year doubling before a 2 year doubling, etc.)

What we want is a scenario where you don't get intermediate doubling intervals at all in the discontinuous case, but you get at least one in the continuous case. Setting s relatively high appears to do the trick.

Here is a scenario where we have very fast post-RSI growth with s=5,c=1,I0=1 and I_AGI=3. I wrote some more code to produce plots of how long each complete interval of doubling took in each scenario. The 'default' rate with no contribution from RSI was 0.7. All the continuous scenarios had two complete doubling intervals over intermediate time frames before the doubling time collapsed to under 0.05 on the third doubling. The discontinuous model simply kept the original doubling interval until it collapsed to under 0.05 on the third doubling interval. It's all in this graph.

Let's make the irresponsible assumption that this actually applies to the real economy, with the current growth mode, non-RSI condition being given by the 'slow/no takeoff', s=0 condition.

The current doubling time is a bit over 23 years. In the shallow continuous progress scenario (red line), we get a 9 year doubling, a 4 year doubling and then a ~1 year doubling. In the discontinuous scenario (purple line) we get 2 23 year doublings and then a ~1 year doubling out of nowhere. In other words, this fairly random setting of the parameters (this was the second set I tried) gives us a Paul slow takeoff if you make the assumption that all of this should be scaled to years of economic doubling. You can see that graph here.

Modelling Continuous Progress

They do disagree about locality, yes, but as far as I can tell that is downstream of the assumption that there won't be a very abrupt switch to a new growth mode. A single project pulling suddenly ahead of the rest of the world would happen if the growth curve is such that with a realistic amount (a few months) of lead time you can get ahead of everyone else.

So the obvious difference in predictions is that e.g. Paul/Robin think that takeoff will occur across many systems in the world while MIRI thinks it will occur in a single system. That is because MIRI thinks that RSI is much more of an all-or-nothing capability than the others, which in turn is because they think AGI is much more likely to depend on a few novel, key insights that produce sudden gains in capability. That was the conclusion of my post.

In the past I've called Locality a practical discontinuity - from the outside world's perspective, does a single project explode out of nowhere? Whether you get a practical discontinuity doesn't just depend on whether progress is discontinuous. If you get a discontinuity to RSI capability then you do get a practical discontinuity, but that is a sufficient, not necessary condition. If the growth curve is steep enough you might get a practical discontinuity anyway.

Perhaps Eliezer-2008 believed that there would be a discontinuity in returns on optimization leading to a practical discontinuity/local explosion but Eliezer-2020 (since de-emphasising RSI) just thinks we will get a local explosion somehow, either from a discontinuity or sufficiently fast continuous progress.

My graphs above do seem to support that view - even most of the 'continuous' scenarios seem to have a fairly abrupt and steep growth curve. I strongly suspect that as well as disagreements about discontinuities, there are very strong disagreements about 'post-RSI speed' - maybe over orders of magnitude.

This is what the curves look like if s is set to 0.1 - the takeoff is much slower even if RSI comes about fairly abruptly.

[AN #80]: Why AI risk might be solved without additional intervention from longtermists
The biggest disagreement between me and more pessimistic researchers is that I think gradual takeoff is much more likely than discontinuous takeoff (and in fact, the first, third and fourth paragraphs above are quite weak if there's a discontinuous takeoff).

It's been argued before that Continuous is not the same as Slow by any normal standard, so the strategy of 'dealing with things as they come up', while more viable under a continuous scenario, will probably not be sufficient.

It seems to me like you're assuming longtermists are very likely not required at all in a case where progress is continuous. I take continuous to just mean that we're in a world where there won't be sudden jumps in capability, or apparently useless systems suddenly crossing some threshold and becoming superintelligent, not where progress is slow or easy to reverse. We could still pick a completely wrong approach that makes alignment much more difficult and set ourselves on a likely path towards disaster, even if the following is true:

So far as I can tell, the best one-line summary for why we should expect a continuous and not a fast takeoff comes from the interview Paul Christiano gave on the 80k podcast: 'I think if you optimize AI systems for reasoning, it appears much, much earlier.'
So far as I can tell, Paul's point is that absent specific reasons to think otherwise, the prima facie case that any time we are trying hard to optimize for some criteria, we should expect the 'many small changes that add up to one big effect' situation.
Then he goes on to argue that the specific arguments that AGI is a rare case where this isn't true (like nuclear weapons) are either wrong or aren't strong enough to make discontinuous progress plausible.

In a world where continuous but moderately fast takeoff is likely, I can easily imagine doom scenarios that would require long term strategy or conceptual research early on to avoid, even if none of them involve FOOM. Imagine that the accepted standard for aligned AI is follows some particular research agenda, like Cooperative Inverse Reinforcement Learning, but it turns out that CIRL starts to behave pathologically and tries to wirehead itself as it gets more and more capable, and that its a fairly deep flaw that we can only patch and not avoid.

Let's say that over the course of a couple of years failures of CIRL systems start to appear and compound very rapidly until they constitute an Existential disaster. Maybe people realize what's going on, but by then it would be too late, because the right approach would have been to try some other approach to AI alignment but the research to do that doesn't exist and can't be done anywhere near fast enough. Like Paul Christiano's what failure looks like

The Value Definition Problem

I appreciate the summary, though the way you state the VDP isn't quite the way I meant it.

what should our AI system <@try to do@>(@Clarifying "AI Alignment"@), to have the best chance of a positive outcome?

To me, this reads like, 'we have a particular AI, what should we try to get it to do', wheras I meant it as 'what Value Definition should we be building our AI to pursue'. So, that's why I stated it as ' what should we aim to get our AI to want/target/decide/do' or, to be consistent with your way of writing it 'what should we try to get our AI system to do to have the best chance of a positive outcome', not 'what should our AI system try to do to have the best chance of a positive outcome'. Aside from that minor terminological difference, that's a good summary of what I was trying to say.

I fall more on the side of preferring indirect approaches, though by that I mean that we should delegate to future humans, as opposed to defining some particular value-finding mechanism into an AI system that eventually produces a definition of values.

I think your opinion is probably the majority opinion - my major point with the 'scale of directness' was to emphasize that our 'particular value-finding mechanisms' can have more or fewer degrees of freedom, since from a certain perspective 'delegate everything to a simulation of future humans' is also a 'particular mechanism' just with a lot more degrees of freedom, so even if you strongly favour indirect approaches you will still have to make some decisions about the nature of the delegation.

The original reason that I wrote this post was to get people to explicitly notice the point that we will probably have to do some philosophical labour ourselves at some point, and then I discovered Stuart Armstrong had already made a similar argument. I'm currently working on another post (also based on the same work at EA Hotel) with some more specific arguments about why we should construct a particular value-finding mechanism that doesn't fix us to any particular normative ethical theory, but does fix us to an understanding of what values are - something I call a Coherent Extrapolated Framework (CEF). But again, Stuart Armstrong anticipated a lot (but not all!) of what I was going to say.

The Value Definition Problem

Thanks for pointing that out to me; I had not come across your work before! I've had a look through your post and I agree that we're saying similar things. I would say that my 'Value Definition Problem' is an (intentionally) vaguer and broader question about what our research program should be - as I argued in the article, this is mostly an axiological question. Your final statement of the Alignment Problem (informally) is:

A must learn the values of H and H must know enough about A to believe A shares H’s values

while my Value Definition Problem is

“Given that we are trying to solve the Intent Alignment problem for our AI, what should we aim to get our AI to want/target/decide/do, to have the best chance of a positive outcome?”

I would say the VDP is about what our 'guiding principle' or 'target' should be in order to have the best chance of solving the alignment problem. I used Christiano's 'intent alignment' formulation but yours actually fits better with the VDP, I think.