Paul Christiano


Iterated Amplification

Wiki Contributions


Yudkowsky and Christiano discuss "Takeoff Speeds"

I don't care about whether the AI is open-sourced (I don't expect anyone to publish the weights even if they describe their method) and I'm not that worried about our ability to arbitrate overfitting.

Ajeya suggested that I clarify: I'm significantly more impressed by an AI getting a gold medal than getting a bronze, and my 4% probability is for getting a gold in particular (as described in the IMO grand challenge). There are some categories of problems that can be solved using easy automation (I'd guess about 5-10% could be done with no deep learning and modest effort). Together with modest progress in deep learning based methods, and a somewhat serious effort, I wouldn't be surprised by people getting up to 20-40% of problems. The bronze cutoff is usually 3/6 problems, and the gold cutoff is usually 5/6 (assuming the AI doesn't get partial credit). The difficulty of problems also increases very rapidly for humans---there are often 3 problems that a human can do more-or-less mechanically.

I could tighten any of these estimates by looking at the distribution more carefully rather than going off of my recollections from 2008, and if this was going to be one of a handful of things we'd bet about I'd probably spend a few hours doing that and some other basic digging.

Yudkowsky and Christiano discuss "Takeoff Speeds"

I think I'll get less confident as our accomplishments get closer to the IMO grand challenge. Or maybe I'll get much more confident if we scale up from $1M -> $1B and pick the low hanging fruit without getting fairly close, since at that point further progress gets a lot easier to predict

There's not really a constant time horizon for my pessimism, it depends on how long and robust a trend you are extrapolating from. 4 years feels like a relatively short horizon, because theorem-proving has not had much investment so compute can be scaled up several orders of magnitude, and there is likely lots of low-hanging fruit to pick, and we just don't have much to extrapolate from (compared to more mature technologies, or how I expect AI will be shortly before the end of days), and for similar reasons there aren't really any benchmarks to extrapolate.

(Also note that it matters a lot whether you know what problems labs will try to take a stab at. For the purpose of all of these forecasts, I am trying insofar as possible to set aside all knowledge about what labs are planning to do though that's obviously not incentive-compatible and there's no particular reason you should trust me to do that.)

Christiano, Cotra, and Yudkowsky on AI progress

My claim is that the timescale of AI self-improvement, at the point it takes over from humans, is the same as the previous timescale of human-driven AI improvement. If it was a lot faster, you would have seen a takeover earlier instead. 

This claim is true in your model. It also seems true to me about hominids, that is I think that cultural evolution took over roughly when its timescale was comparable to the timescale for biological improvements, though Eliezer disagrees

I thought Eliezer's comment "there is a sufficiently high level where things go whooosh humans-from-hominids style" was missing the point. I think it might have been good to offer some quantitative models at that point though I haven't had much luck with that.

I can totally grant there are possible models for why the AI moves quickly from "much slower than humans" to "much faster than humans," but I wanted to get some model from Eliezer to see what he had in mind.

(I find fast takeoff from various frictions more plausible, so that the question mostly becomes one about how close we are to various kinds of efficient frontiers, and where we respectively predict civilization to be adequate/inadequate or progress to be predictable/jumpy.)

Christiano, Cotra, and Yudkowsky on AI progress

He says things like AlphaGo or GPT-3 being really surprising to gradualists, suggesting he thinks that gradualism only works in hindsight.

I agree that after shaking out the other disagreements, we could just end up with Eliezer saying "yeah but automating AI R&D is just fundamentally unlike all the other tasks to which we've applied AI" (or "AI improving AI will be fundamentally unlike automating humans improving AI") but I don't think that's the core of his position right now.

Yudkowsky and Christiano discuss "Takeoff Speeds"

I think IMO gold medal could be well before massive economic impact, I'm just surprised if it happens in the next 3 years. After a bit more thinking (but not actually looking at IMO problems or the state of theorem proving) I probably want to bump that up a bit, maybe 2%, it's hard reasoning about the tails. 

I'd say <4% on end of 2025.

I think this is the flipside of me having an intuition where I say things like "AlphaGo and GPT-3 aren't that surprising"---I have a sense for what things are and aren't surprising, and not many things happen that are so surprising.

If I'm at 4% and you are 12% and we had 8 such bets, then I can get a factor of 2 if they all come out my way, and you get a factor of ~1.5 if one of them comes out your way.

I might think more about this and get a more coherent probability distribution, but unless I say something else by end of 2021 you can consider 4% on end of 2025 this my prediction.

Christiano, Cotra, and Yudkowsky on AI progress

Oops, this was in reference to the later part of the discussion where you disagreed with "a human in a big animal body, with brain adapted to operate that body instead of our own, would beat a big animal [without using tools]".

Christiano, Cotra, and Yudkowsky on AI progress

It seems to me like Eliezer rejects a lot of important heuristics like "things change slowly" and "most innovations aren't big deals" and so on. One reason he may do that is because he literally doesn't know how to operate those heuristics, and so when he applies them retroactively they seem obviously stupid. But if we actually walked through predictions in advance, I think he'd see that actual gradualists are much better predictors than he imagines.

Christiano, Cotra, and Yudkowsky on AI progress

(I'm interested in which of my claims seem to dismiss or not adequately account for the possibility that continuous systems have phase changes.)

Christiano, Cotra, and Yudkowsky on AI progress

(ETA: this wasn't actually in this log but in a future part of the discussion.)

I found the elephants part of this discussion surprising. It looks to me like human brains are better than elephant brains at most things, and it's interesting to me that Eliezer thought otherwise. This is one of the main places where I couldn't predict what he would say.

Christiano, Cotra, and Yudkowsky on AI progress

I agree we seem to have some kind of deeper disagreement here.

I think stack more layers + known training strategies (nothing clever) + simple strategies for using test-time compute (nothing clever, nothing that doesn't use the ML as a black box) can get continuous improvements in tasks like reasoning (e.g. theorem-proving), meta-learning (e.g. learning to learn new motor skills), automating R&D (including automating executing ML experiments, or proposing new ML experiments), or basically whatever.

I think these won't get to human level in the next 5 years. We'll have crappy versions of all of them. So it seems like we basically have to get quantitative. If you want to talk about something we aren't currently measuring, then that probably takes effort, and so it would probably be good if you picked some capability where you won't just say "the Future is hard to predict." (Though separately I expect to make somewhat better predictions than you in most of these domains.)

A plausible example is that I think it's pretty likely that in 5 years, with mere stack more layers + known techniques (nothing clever), you can have a system which is clearly (by your+my judgment) "on track" to improve itself and eventually foom, e.g. that can propose and evaluate improvements to itself, whose ability to evaluate proposals is good enough that it will actually move in the right direction and eventually get better at the process, etc., but that it will just take a long time for it to make progress. I'd guess that it looks a lot like a dumb kid in terms of the kind of stuff it proposes and its bad judgment (but radically more focused on the task and conscientious and wise than any kid would be). Maybe I think that's 10% unconditionally, but much higher given a serious effort. My impression is that you think this is unlikely without adding in some missing secret sauce to GPT, and that my picture is generally quite different from your criticallity-flavored model of takeoff.

Load More