Oliver Habryka

Coding day in and out on LessWrong 2.0. You can reach me at habryka@lesswrong.com

Wiki Contributions


Direct optimizers typically have a very specific architecture requiring substantial iteration and search. Luckily, it appears that our current NN architectures, with a fixed-length forward pass and a lack of recurrence or support for branching computations as is required in tree search makes the implementation of powerful mesa-optimizers inside the network quite challenging.

I think this is being too confident on what "direct optimizers" require. 

There is an ontology, mostly inherited from the graph-search context, in which "direct optimizers" require recurrence and iteration, but at least I don't have any particularly strong beliefs about what a direct optimizer needs in terms of architecture, and don't think other people know either. The space of feed-forward networks is surprisingly large and rich and I am definitely not confident you can't find a direct optimizer in that space. 

Current LLMs also get quite high scores at imitating pretty agentic and optimizy humans, which suggest the networks do perform something quite close to search or direct optimization somewhere within it's forward pass.

Perhaps I've simply been misreading John, and he's been intending to say "I have some beliefs, and separately I have some suggestive technical results, and they feel kinda related to me! Which is not to say that any onlooker is supposed to be able to read the technical results and then be persuaded of any of my claims; but it feels promising and exciting to me!".

For what it's worth, I ask John about once ever month or two about his research progress and his answer has so far been (paraphrased) "I think I am making progress. I don't think I have anything to show you that would definitely convince you of my progress, which is fine because this is a preparadigmatic field. I could give you some high-level summaries or we could try to dive into the math, though I don't think I have anything super robust in the math so far, though I do think I have interesting approaches."

You might have had a totally different experience, but I've definitely had the epistemic state so far that John's math was in the "trying to find remotely reasonable definitions with tenuous connection of formalism to reality" stage, and not the "I have actually demonstrated robust connection of math to reality stage", so I feel very non-mislead by John. A good chunk of this impression comes from random short social interactions I've had with John, so someone who more engaged with just his online writing might come away with a different impression (though I've also done that a lot and don't super feel like John has ever tried to sell me in his writing on having super robust math to back things up).

This is just false, because it is not taking into account the cost of doing expected value maximization, since giving consistent preferability scores is just very expensive and hard to do reliably.

I do really want to put emphasis on the parenthetical remark "(at least in some situations, though they may not arise)". Katja is totally aware that the coherence arguments require a bunch of preconditions that are not guaranteed to be the case for all situations, or even any situation ever, and her post is about how there is still a relevant argument here.

Crossposting this comment from the EA Forum: 

Nuno says: 

I appreciate the whole post. But I personally really enjoyed the appendix. In particular, I found it informative that Yudkowsk can speak/write with that level of authoritativeness, confidence, and disdain for others who disagree, and still be wrong (if this post is right).

I respond:

(if this post is right)

The post does actually seem wrong though. 

I expect someone to write a comment with the details at some point (I am pretty busy right now, so can only give a quick meta-level gleam), but mostly, I feel like in order to argue that something is wrong with these arguments is that you have to argue more compellingly against completeness and possible alternative ways to establish dutch-book arguments. 

Also, the title of "there are no coherence arguments" is just straightforwardly wrong. The theorems cited are of course real theorems, they are relevant to agents acting with a certain kind of coherence, and I don't really understand the semantic argument that is happening where it's trying to say that the cited theorems aren't talking about "coherence", when like, they clearly are.

You can argue that the theorems are wrong, or that the explicit assumptions of the theorems don't hold, which many people have done, but like, there are still coherence theorems, and IMO completeness seems quite reasonable to me and the argument here seems very weak (and I would urge the author to create an actual concrete situation that doesn't seem very dumb in which a highly intelligence, powerful and economically useful system has non-complete preferences).

The whole section at the end feels very confused to me. The author asserts that there is "an error" where people assert that "there are coherence theorems", but man, that just seems like such a weird thing to argue for. Of course there are theorems that are relevant to the question of agent coherence, all of these seem really quite relevant. They might not prove the things in-practice, as many theorems tend to do, and you are open to arguing about that, but that doesn't really change whether they are theorems. 

Like, I feel like with the same type of argument that is made in the post I could write a post saying "there are no voting impossibility theorems" and then go ahead and argue that the Arrow's Impossibility Theorem assumptions are not universally proven, and then accuse everyone who ever talked about voting impossibility theorems that they are making "an error" since "those things are not real theorems". And I think everyone working on voting-adjacent impossibility theorems would be pretty justifiedly annoyed by this.

Yep, I think it's pretty plausible this is just a data-quality issue, though I find myself somewhat skeptical of this. Maybe worth a bet? 

I would be happy to bet that conditional on them trying to solve this with more supervised training and no RLHF, we are going to see error modes substantially more catastrophic than current Chat-GPT. 

Yeah, this is basically my point. Not sure whether whether you are agreeing or disagreeing. I was specifically quoting Paul's comment saying "I've seen only modest qualitative differences" in order to disagree and say "I think we've now seen substantial qualitative differences". 

We have had 4chan play around with Chat-GPT for a while, with much less disastrous results than what happened when they got access to Sydney.

It is not news to anyone here that average-case performance on proxy metrics on some tame canned datasets may be unrelated to out-of-distribution robustness on worst-case adversary-induced decision-relevant losses, in much the same way that model perplexity tells us little about what a model is useful for or how vulnerable it is.

I wish that this not being news to anyone here was true but this does not currently seem true to me. But doesn't seem worth going into.

I think the qualitative difference between the supervised tuning done in text-davinci-002 and the RLHF in text-davinci-003 is modest (e.g. I've seen head-to-head comparisons suggesting real but modest effects on similar tasks).

Ok, I think we might now have some additional data on this debate. It does indeed look like to me that Sydney was trained with the next best available technology after RLHF, for a few months, at least based on Gwern's guesses here: https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned?commentId=AAC8jKeDp6xqsZK2K 

As far as I can tell this resulted in a system with much worse economic viability than Chat-GPT. I would overall describe Sydney as "economically unviable", such that if Gwern's story here is correct, the difference between using straightforward supervised training on chat transcripts and OpenAIs RLHF pipeline is indeed the difference between an economically viable and unviable product. 

There is a chance that Microsoft fixes this with more supervised training, but my current prediction is that they will have to fix this with RLHF, because the other technological alternatives are indeed no adequate substitutes from an economic viability perspective, which suggests that the development of RLHF did really matter a lot for this.

Relevant piece of data: https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/?fbclid=IwAR3KTBnxC_y7n0TkrCdcd63oBuwnu6wyXcDtb2lijk3G-p9wdgD9el8KzQ4 

Feb 1 (Reuters) - ChatGPT, the popular chatbot from OpenAI, is estimated to have reached 100 million monthly active users in January, just two months after launch, making it the fastest-growing consumer application in history, according to a UBS study on Wednesday.

The report, citing data from analytics firm Similarweb, said an average of about 13 million unique visitors had used ChatGPT per day in January, more than double the levels of December.

"In 20 years following the internet space, we cannot recall a faster ramp in a consumer internet app," UBS analysts wrote in the note.

I had some decent probability on this outcome but I have increased my previous estimate of the impact of Chat-GPT by 50%, since I didn't expect something this radical ("the single fastest growing consumer product in history").

I didn't realize how broadly you were defining AI investment. If you want to say that e.g ChatGPT increased investment by $10B out of $200-500B, so like +2-5%, I'm probably happy to agree (and I also think it had other accelerating effects beyond that).

Makes sense, sorry for the miscommunication. I really didn't feel like I was making a particularly controversial claim with the $10B, so was confused why it seemed so unreasonable to you. 

I do think those $10B are going to be substantially more harmful for timelines than other money in AI, because I do think a good chunk of that money will much more directly aim at AGI than most other investment. I don't know what my multiplier here for effect should be, but my guess is something around 3-5x in expectation (I've historically randomly guessed that AI applications are 10x less timelines-accelerating per dollar than full-throated AGI-research, but I sure have huge uncertainty about that number). 

That, plus me thinking there is a long tail with lower probability where Chat-GPT made a huge difference in race dynamics, and thinking that this marginal increase in investment does probably translate into increases in total investment, made me think this was going to shorten timelines in-expectation by something closer to 8-16 weeks, which isn't enormously far away from yours, though still a good bit higher. 

And yeah, I do think the thing I am most worried about with Chat-GPT in addition to just shortening timelines is increasing the number of actors in the space, which also has indirect effects on timelines. A world where both Microsoft and Google are doubling down on AI is probably also a world where AI regulation has a much harder time taking off. Microsoft and Google at large also strike me as much less careful actors than the existing leaders of AGI labs which have so far had a lot of independence (which to be clear, is less of an endorsement of current AGI labs, and more of a statement about very large moral-maze like institutions with tons of momentum). In-general the dynamics of Google and Microsoft racing towards AGI sure is among my least favorite takeoff dynamics in terms of being able to somehow navigate things cautiously. 

One thing worth pointing out in defense of your original estimate is that variance should add up to 100%, not effect sizes, so e.g. if the standard deviation is $100B then you could have 100 things each explaining ($10B)^2 of variance (and hence each responsible for +-$10B effect sizes after the fact).

Oh, yeah, good point. I was indeed thinking of the math a bit wrong here. I will think a bit about how this adjusts my estimates, though I think I was intuitively taking this into account.

How much total investment do you think there is in AI in 2023?

My guess is total investment was around the $200B - $500B range, with about $100B of that into new startups and organizations, and around $100-$400B of that in organizations like Google and Microsoft outside of acquisitions. I have pretty high uncertainty on the upper end here, since I don't know what fraction of Google's revenue gets reinvested again into AI, how much Tesla is investing in AI, how much various governments are investing, etc.

How much variance do you think there is in the level of 2023 investment in AI? (Or maybe whatever other change you think is equivalent.)

Variance between different years depending on market condition and how much products take off seems like on the order of 50% to me. Like, different years have pretty hugely differing levels of investment.

My guess is about 50% of that variance is dependent on different products taking off, how much traction AI is getting in various places, and things like Chat-GPT existing vs. not existing. 

So this gives around $50B - $125B of variance to be explained by product-adjacent things like Chat-GPT.

How much influence are you giving to GPT-3, GPT-3.5, GPT-4? How much to the existence of OpenAI? How much to the existence of Google? How much to Jasper? How much to good GPUs?

Existence of OpenAI is hard to disentangle from the rest. I would currently guess that in terms of total investment, GPT-2 -> GPT-3 made a bigger difference than GPT-3.5 -> Chat-GPT, but both made a much larger difference than GPT-3 -> GPT-3.5. 

I don't think Jasper made a huge difference, since its userbase is much smaller than Chat-GPT, and also evidently the hype from it has been much lower. 

Good GPUs feels kind of orthogonal. We can look at each product that makes up my 50% of the variance to be explained and see how useful/necessary good GPUs were for its development, and my sense is for Chat-GPT at least the effect of good GPUs were relatively minor since I don't think the training to move from GPT-3.5 to Chat-GPT was very compute intensive.

I would feel fine saying expected improvements in GPUs are responsible for 25% of the 50% variance (i.e. 17.5%) if you chase things back all the way, though that again feels like it isn't trying to add up to 100% with the impact from "Chat-GPT". I do think it's trying to add up to 100% with the impact from "RLHF's effect on Chat-GPT", which I claimed was at least 50% of the impact of Chat-GPT in-particular. 

In any case, in order to make my case for $10B using these numbers I would have to argue that between 20% and 8% of the product-dependent variance in annual investment into AI is downstream of Chat-GPT, and indeed that still seems approximately right to me after crunching the numbers. It's by far the biggest AI product of the last few years, it is directly credited with sparking an arms race between Google and Microsoft, and indeed even something as large as 40% wouldn't seem totally crazy to me, since these kinds of things tend to be heavy-tailed, so if you select on the single biggest thing, there is a decent chance you underestimate its effect.

Load More