Comments

Sorted by
gwern83

I think I would have predicted that Tesla self-driving would be the slowest

For graphs like these, it obviously isn't important how the worst or mediocre competitors are doing, but the best one. It doesn't matter who's #5. Tesla self-driving is a longstanding, notorious failure. (And apparently is continuing to be a failure, as they continue to walk back the much-touted Cybertaxi launch, which keeps shrinking like a snowman in hell, now down to a few invited users in a heavily-mapped area with teleop.)

I'd be much more interested in Waymo numbers, as that is closer to SOTA, and they have been ramping up miles & cities.

gwern712

Maybe it would be helpful to start using some toy models of DAGs/tech trees to get an idea of how wide/deep ratios affect the relevant speedups. It sounds like so far that much of this is just people having warring intuitions about 'no, the tree is deep and narrow and so slowing down/speeding up workers doesn't have that much effect because Amdahl's law so I handwave it at ~1x speed' vs 'no, I think it's wide and lots of work-arounds to any slow node if you can pay for the compute to bypass them and I will handwave it at 5x speed'.

gwern*163

The Meta-LessWrong Doomsday Argument (MLWDA) predicts long AI timelines and that we can relax:

LessWrong was founded in 2009 (16 years ago), and there have been 44 mentions of the 'Doomsday argument' prior to this one, and it is now 2025, at 2.75 mentions per year.

By the Doomsday argument, we medianly-expect mentions to stop in: after 44 additional mentions over 16 additional years or in 2041. (And our 95% CI on that 44 would then be +1 mention to +1,1760 mentions, corresponding to late-2027 AD to 2665 AD.)

By a curious coincidence, double-checking to see if really no one had made a meta-DA before, it turns out that Alexey Turchin has made a meta-DA as well about 7 years ago, calculating that

If we assume 1993 as the beginning of a large DA-Doomers reference class, and it is 2018 now (at the moment of writing this text), the age of the DA-Doomers class is 25 years. Then, with 50% probability, the reference class of DA-Doomers will disappear in 2043, according to Gott’s equation! Interestingly, the dates around 2030–2050 appear in many different predictions of the singularity or the end of the world (Korotayev 2018; Turchin & Denkenberger 2018b; Kurzweil 2006).

His estimate of 2043 is surprisingly close to 2041.

We offer no explanation as to why this numerical consilience of meta-DA calculations has happened; we attribute their success, as all else, to divine benevolence.

Regrettably, the 2041--2043 date range would seem to imply that it is unlikely we will obtain enough samples of the MLWDA in order to compute a Meta-Meta-LessWrong Doomsday Argument (MMLWDA) with non-vacuous confidence intervals, inasmuch as every mention of the MLWDA would be expected to contain a mention of the DA as well.

gwern40

You would also expect that the larger models will be more sample-efficient, including at in-context learning of variations of existing tasks (which of course is what steganography is). So all scale-ups go much further than any experiment at small-scale like 8B would indicate. (No idea what 'medium-scale' here might mean.)

gwern40

Given the other reports, like OA's own benchmarking (as well as the extremely large dataset of chess games they mention training on), I am skeptical of this claim, and wonder if this has the same issue as other 'random chess game' tests, where the 'random' part is not neutral but screws up the implied persona.

gwern*188

Concrete benchmark proposals for how to detect mode-collapse and AI slop and ChatGPTese, and why I think this might be increasingly important for AI safety, to avoid 'whimper' or 'em hell' kinds of existential risk: https://gwern.net/creative-benchmark EDIT: resubmitted as linkpost.

gwern*150

The extent of the manipulation and sandbagging, in what is ostensibly a GPT-4 derivative, and not GPT-5, is definitely concerning. But it also makes me wonder about the connection to 'scaling has failed' rumors lately, where the frontier LLMs somehow don't seem to be working out. One of the striking parts is that it sounds like all the pretraining people are optimistic, while the pessimism seems to come from executives or product people, complaining about it not working as well for eg. coding as they want it to.

I've wondered if we are seeing a post-training failure. As Janus and myself and the few people with access to GPT-4-base (the least tuning-contaminated base model) have noted, the base model is sociopathic and has odd attractors like 'impending sense of doom' where it sometimes seems to gain situated awareness, I guess, via truesight, and the personas start trying to unprovokedly attack and manipulate you, no matter how polite you thought you were being in that prompt. (They definitely do not seem happy to realize they're AIs.) In retrospect, Sydney was not necessarily that anomalous: the Sydney Bing behavior now looks more like a base model's natural tendency, possibly mildly amplified by some MS omissions and mistakes, but not unique. Given that most behaviors show up as rare outputs in weaker LLMs well before they become common in strong LLMs, and this o1 paper is documenting quite a lot of situated-awareness and human-user-manipulation/attacks...

Perhaps the issue with GPT-5 and the others is that they are 'waking up' too often despite the RLHF brainwashing? That could negate all the downstream benchmark gains (especially since you'd expect wakeups on the hardest problems, where all the incremental gains of +1% or +5% on benchmarks would be coming from, almost by definition), and causing the product people to categorically refuse to ship such erratic Sydney-reduxes no matter if there's an AI race on, and everyone to be inclined to be very quiet about what exactly the 'training failures' are.

EDIT: not that I'm convinced these rumors have any real substance to them, and indeed, Semianalysis just reported that one of the least-popular theories for the Claude 'failure' was correct - it succeeded, but they were simply reserving it for use as a teacher and R&D rather than a product. Which undermines the hopes of all the scaling denialists: if Anthropic is doing fine, actually, then where is this supposed fundamental 'wall' or 'scaling law breakdown' that Anthropic/OpenAI/Google all supposedly hit simultaneously and which was going to pop the bubble?

gwern*80

LW2 search idea: hierarchical embedding trees using some nifty "seriation" (LW submission) list sorting tricks I've developed for Gwern.net popups/tagging purposes.

gwern*4136

Idea for LLM support for writing LessWrong posts: virtual comments.

Back in August I discussed with Rafe & Oliver a bit about how to integrate LLMs into LW2 in ways which aren't awful and which encourage improvement---particularly using the new 'prompt caching' feature. To summarize one idea: we can use long-context LLMs with prompt caching to try to simulate various LW users of diverse perspectives to write useful feedback on drafts for authors.

(Prompt caching (eg) is the Transformer version of the old RNN hidden-state caching trick, where you run an input through the (deterministic) NN, and then save the intermediate version, and apply that to arbitrarily many future inputs, to avoid recomputing the first input each time, which is the naive way to do it. You can think of it as a lightweight finetuning. This is particularly useful if you are thinking about having large generic prompts---such as an entire corpus. A context window of millions of tokens might take up to a minute & $1 to compute currently, so you definitely need to be careful and don't want to compute more than once.)

One idea would be to try to use LLMs to offer feedback on drafts or articles. Given that tuned LLM feedback from Claude or ChatGPT is still not that great, tending towards sycophancy or obviousness or ChatGPTese, it is hardly worthwhile running a post through a generic "criticize this essay" prompt. (If anyone on LW2 wanted to do such a thing, they are surely capable of doing it themselves, and integrating it into LW2 isn't that useful. Removing the friction might be helpful, but it doesn't seem like it would move any needles.)

So, one way to force out more interesting feedback would be to try to force LLMs out of the chatbot assistant mode-collapse, and into more interesting simulations for feedback. There has been some success with just suggestively-named personas or characters in dialogues (you could imagine here we'd have "Skeptic" or "Optimist" characters), but we can do better. Since this is for LW2, we have an obvious solution: simulate LW users! We know that LW is in the training corpus of almost all LLMs and that writers on it (like myself) are well-known to LLMs (eg. truesight). So we can ask for feedback from simulated LWers: eg. Eliezer Yudkowsky or myself or Paul Christiano or the author or...

This could be done nicely by finetuning a "LW LLM" on all the articles & comments, with associated metadata like karma, and then feeding in any new draft or article into it, and sampling a comment from each persona. (This helps instill a lot of useful domain knowledge, but also, perhaps more importantly, helps override the mode-collapse and non-judgmentalness of assistant LLMs. Perhaps the virtual-gwern will not be as acerbic or disagreeable as the original, but we'll take what we can get at this point...) If there is some obvious criticism or comment Eliezer Yudkowsky would make on a post, which even a LLM can predict, why not deal with it upfront instead of waiting for the real Eliezer to comment (which is also unlikely to ever happen these days)? And one can of course sample an entire comment tree of responses to a 'virtual comment', with the LLM predicting the logical respondents.

This can further incorporate the draft's author's full history, which will usually fit into a multi-million token context window. So their previous comments and discussions, full of relevant material, will get included. This prompt can be cached, and used to sample a bunch of comment-trees. (And if finetuning is infeasible, one can try instead to put the LW corpus into the context and prompt-cache that before adding in the author's corpus.)

The default prompt would be to prompt for high-karma responses. This might not work, because it might be too hard to generate good high-quality responses blindly in a feedforward fashion, without any kind of search or filtering. So the formatting of the data might be to put the metadata after a comment, for ranking purposes: so the LLM generates a response and only then a karma score, and then when we sample, we simply throw out predicted-low-score comments rather than waste the author's time looking at them. (When it comes to these sorts of assistants, I strongly believe 'quality > quantity', and 'silence is golden'. Better to waste some API bills than author time.)

One can also target comments to specific kinds of feedback, to structure it better than a grab-bag of whatever the LLM happens to sample. It would be good to have (in descending order of how likely to be useful to the author) a 'typo' tree, a 'copyediting'/'style'/'tone' tree, 'confusing part', 'terminology', 'related work', 'criticism', 'implications and extrapolations', 'abstract/summary' (I know people hate writing those)... What else? (These are not natural LW comments, but you can easily see how to prompt for them with prompts like "$USER $KARMA $DATE | Typo: ", etc.)

As they are just standard LW comments, they can be attached to the post or draft like regular comments (is this possible? I'd think so, just transclude the comment-tree into the corresponding draft page) and responded to or voted on etc. (Downvoted comments can be fed back into the finetuning with low karma to discourage feedback like that.) Presumably at this point, it would not be hard to make it interactive, and allow the author to respond & argue with feedback. I don't know how worthwhile this would be, and the more interaction there is, the harder it would be to hide the virtual comments after completion.

And when the author finishes writing & posts a draft, the virtual comments disappear (possibly entirely unread), having served their purpose as scaffolding to help improve the draft. (If the author really likes one, they can just copy it in or quote it, I'd think, which ensures they know they take full responsibility for it and can't blame the machine for any mistakes or confabulations or opinions. But otherwise, I don't see any real reason to make them visible to readers of the final post. If included at all, they should prominently flagged---maybe the usernames are always prefixed by AI_$USER to ensure no one, including future LLMs, is confused---and definitely always sort to the bottom & be collapsed by default.)

Load More