DanielFilan

# Sequences

AXRP - the AI X-risk Research Podcast

# Wiki Contributions

Sorted by

FWIW, the discussion of AI-driven propaganda doesn't seem as prescient.

So [in 2024], the most compute spent on a single training run is something like 5x10^25 FLOPs.

As of June 20th 2024, this is exactly Epoch AI's central estimate of the most compute spent on a single training run, as displayed on their dashboard.

Frankfurt-style counterexamples for definitions of optimization

In "Bottle Caps Aren't Optimizers", I wrote about a type of definition of optimization that says system S is optimizing for goal G iff G has a higher value than it would if S didn't exist or were randomly scrambled. I argued against these definitions by providing a examples of systems that satisfy the criterion but are not optimizers. But today, I realized that I could repurpose Frankfurt cases to get examples of optimizers that don't satisfy this criterion.

A Frankfurt case is a thought experiment designed to disprove the following intuitive principle: "a person is morally responsible for what she has done only if she could have done otherwise." Here's the basic idea: suppose Alice is considering whether or not to kill Bob. Upon consideration, she decides to do so, takes out her gun, and shoots Bob. But little-known to her, a neuroscientist had implanted a chip in her brain that would have forced her to shoot Bob if she had decided not to. That said, the chip didn't activate, because she did decide to shoot Bob. The idea is that she's morally responsible, even tho she couldn't have done otherwise.

Anyway, let's do this with optimizers. Suppose I'm playing Go, thinking about how to win - imagining what would happen if I played various moves, and playing moves that make me more likely to win. Further suppose I'm pretty good at it. You might want to say I'm optimizing my moves to win the game. But suppose that, unbeknownst to me, behind my shoulder is famed Go master Shin Jinseo. If I start playing really bad moves, or suddenly die or vanish etc, he will play my moves, and do an even better job at winning. Now, if you remove me or randomly rearrange my parts, my side is actually more likely to win the game. But that doesn't mean I'm optimizing to lose the game! So this is another way such definitions of optimizers are wrong.

That said, other definitions treat this counter-example well. E.g. I think the one given in "The ground of optimization" says that I'm optimizing to win the game (maybe only if I'm playing a weaker opponent).

Thanks for finding this! Will link it in the transcript.

Hmm I think somehow the problem is that the equals sign in your url is being encoded as an ASCII value with a % sign etc rather than being treated as a raw equals sign, weird.

For one particularly legible example, see this comment by janus.

Link should presumably be to this comment.

The other is the friendly robot waving hello just underneath.

Let it be known: I'm way more likely to respond to (and thereby algorithmically signal-boost) criticisms of AI doomerism that I think are dumb than those that I think are smart, because the dumb objections are easier to answer. Caveat emptor.

I'm confident that if there were a "pro-AI" meme with a friendly-looking base model, LW / the shoggoth enjoyers would have nitpicked the friendly meme-creature to hell. They would (correctly) point out "hey, we don't actually know how these things work; we don't know them to be friendly, or what they even 'want' (if anything); we don't actually know what each stage of training does..."

I'm sure that nothing bad will happen to me if I slap this (friendly AI meme) on my laptop, right? I'll be able to think perfectly neutrally about whether AI will be friendly.

I have multiple cute AI stickers on my laptop, one of which is a shoggoth meme. Here is a picture of them. Nobody has ever nitpicked their friendly appearance to me. I don't think they have distorted my thinking about AI in favour of thinking that it will be friendly (altho I think it was after I put them on that I became convinced by a comment by Paul Christiano that there's ~even odds that unaligned AI wouldn't kill me, so do with that information what you will).

FYI: I am not using the dialogue matching feature. If you want to dialogue with me, your best bet is to ask me. I will probably say no, but who knows.