Sequences

AXRP - the AI X-risk Research Podcast

Comments

The other is the friendly robot waving hello just underneath.

Let it be known: I'm way more likely to respond to (and thereby algorithmically signal-boost) criticisms of AI doomerism that I think are dumb than those that I think are smart, because the dumb objections are easier to answer. Caveat emptor.

I'm confident that if there were a "pro-AI" meme with a friendly-looking base model, LW / the shoggoth enjoyers would have nitpicked the friendly meme-creature to hell. They would (correctly) point out "hey, we don't actually know how these things work; we don't know them to be friendly, or what they even 'want' (if anything); we don't actually know what each stage of training does..."

I'm sure that nothing bad will happen to me if I slap this (friendly AI meme) on my laptop, right? I'll be able to think perfectly neutrally about whether AI will be friendly.

I have multiple cute AI stickers on my laptop, one of which is a shoggoth meme. Here is a picture of them. Nobody has ever nitpicked their friendly appearance to me. I don't think they have distorted my thinking about AI in favour of thinking that it will be friendly (altho I think it was after I put them on that I became convinced by a comment by Paul Christiano that there's ~even odds that unaligned AI wouldn't kill me, so do with that information what you will).

FYI: I am not using the dialogue matching feature. If you want to dialogue with me, your best bet is to ask me. I will probably say no, but who knows.

Does this not essentially amount to just assuming that the inductive bias of neural networks in fact matches the prior that we (as humans) have about the world?

No? It amounts to assuming that smaller neural networks are a better match for the actual data generating process of the world.

One argument sketch using SLT that NNs are biased towards low complexity solutions: suppose reality is generated by a width 3 network, and you're modelling it with a width 4 network. Then, along with the generic symmetries, optional solutions also have continuous symmetries where you can switch which neuron is turned off.

Roughly, say neurons 3 and 4 have the same input weight vectors (so their activations are the same), but neuron 4's output weight vector is all zeros. Then you can continuously scale up the output vector of neuron 4 while simultaneously scaling down the output vector of neuron 3 to leave the network computing the same function. Also, when neuron 4 has zero weights as inputs and outputs you can arbitrarily change the inputs or the outputs but not both.

Anyway, this means that when the data is generated by a slim neural net, optimal nets will have a good RLCT, but when it's generated by a neural net of the right width, optimal nets will have a bad RLCT. So nets can learn simple data, and it's easier for them to learn simple data than complex data - assuming thin neural nets count as simple.

This is basically a justification of something like your point 1, but AFAICT it's closer to a proof in the SLT setting than in your setting.

Maybe - but you definitely can't get it if you don't even try to communicate the thing you think would be better.

For instance, if I was running the US, I'd probably slow down scaling considerably, but I'd mostly be interested in implementing safety standards similar to RSPs due to lack of strong international coordination.

Surely if you were running the US, that would be a great position to try to get international coordination on policies you think are best for everyone?

hiding your beliefs, in ways that predictably leads people to believe false things, is lying

I think this has got to be tempered by Grice to be accurate. Like, if I don't bring up some unusual fact about my life in a brief conversation (e.g. that I consume iron supplements once a week), this predictably leads people to believe something false about my life (that I do not consume iron supplements once a week), but is not reasonably understood as the bad type of lie - otherwise to be an honest person I'd have to tell everyone tons of minutiae about myself all the time that they don't care about.

Is this relevant to the point of the post? Maybe a bit - if I (that is, literally me) don't tell the world that I wish people would stop advancing the frontier of AI, I don't think that's terribly deceitful or ruining coordination. What has to be true for me to have a duty to say that? Maybe for me to be a big AI thinkfluencer or something? I'm not sure, and the post doesn't really make it clear.

I mean, whether something's realistic and whether something's actionable are two different things (both separate from whether something's nebulous) - even if it's hard to make a pause happen, I have a decent guess about what I'd want to do to up those odds: protest, write to my congress-person, etc.

As to the realism, I think it's more realistic than I think you think it is. My impression of AI Impacts' technological temptation work is that governments are totally willing to enact policies that impoverish their citizens without requiring a rigourous CBA. Early wins does seem like an important consideration, but you can imagine trying to get some early wins by e.g. banning AI from being used in certain domains, banning people from developing advanced AI without doing X, Y, or Z.

Load More