Matthew "Vaniver" Graves


AMA: Paul Christiano, alignment researcher

Is "movies" a standin for "easily duplicated cultural products", or do you think movies in particular are underproduced?

Can you get AGI from a Transformer?

Ah, I now suspect that I misunderstood you as well earlier: you wanted your list to be an example of "what you mean by DNN-style calculations" but I maybe interpreted as "a list of things that are hard to do with DNNs". And under that reading, it seemed unfair because the difficulty that even high-quality DNNs have in doing simple arithmetic is mirrored by the difficulty that humans have in doing simple arithmetic.

Similarly, I agree with you that there are lots of things that seem very inefficient to implement via DNNs rather than directly (like MCTS, or simple arithmetic, or so on), but it wouldn't surprise me if it's not that difficult to have a DNN-ish architecture that can more easily implement MCTS than our current ones. The sorts of computations that you can implement with transformers are more complicated than the ones you could implement with convnets, which are more complicated than the ones you could implement with fully connected nets; obviously you can't gradient descent a fully connected net into a convnet, or a convnet into a transformer, but you can still train a transformer with gradient descent.

It's also not obvious to me that humans are doing the more sophisticated thinking 'the smart way' instead of 'the dumb way'. Suppose our planning algorithms are something like MCTS; is it 'coded in directly' like AlphaGo's, or is it more like a massive transformer that gradient-descented its way into doing something like MCTS? Well, for things like arithmetic and propositional logic, it seems pretty clearly done 'the dumb way', for things like planning and causal identification it feels more like an open question, and so I don't want to confidently assert that our brains are doing it the dumb way. My best guess is they have some good tricks, but won't be 'optimal' according to future engineers who understand all of this stuff.

Can you get AGI from a Transformer?

Do you think DNNs and human brains are doing essentially the same type of information processing? If not, how did you conclude "humans can't do those either"? Thanks!

Sorry for the late reply, but I was talking from personal experience. Multiplying matrices is hard! Even for extremely tiny ones, I was sped up tremendously by pencil and paper. It was much harder than driving a car, or recognizing whether a image depicts a dog or not. Given the underlying computational complexity of the various tasks, I can only conclude that I'm paying an exorbitant performance penalty for the matmul. (And I'm in the top few percentiles of calculation ability, so this isn't me being bad at it by human standards.)

The general version of this is Moravec's Paradox.


[edit] Also if you look at the best training I'm aware of to solve a simpler arithmetic problems (the mental abacus method), it too demonstrates this sort of exorbitant performance penalty. They're exapting the ability to do fine motions in 3d space to multiply and add!

Updating the Lottery Ticket Hypothesis

That seems right, but also reminds me of the point that you need to randomly initialize your neural nets for gradient descent to work (because otherwise the gradients everywhere are the same). Like, in the randomly initialized net, each edge is going to be part of many subcircuits, both good and bad, and the gradient is basically "what's your relative contribution to good subcircuits vs. bad subcircuits?"

Updating the Lottery Ticket Hypothesis

But this is what would be necessary for the "lottery ticket" intuition (i.e. training just picks out some pre-existing useful functionality) to work.

I don't think I agree, because of the many-to-many relationship between neurons and subcircuits.  Or, like, I think the standard of 'reliability' for this is very low. I don't have a great explanation / picture for this intuition, and so probably I should refine the picture to make sure it's real before leaning on it too much?

To be clear, I think I agree with your refinement as a more detailed picture of what's going on; I guess I just think you're overselling how wrong the naive version is?

Updating the Lottery Ticket Hypothesis

Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization.

I think there are papers showing exactly this, like Deconstructing Lottery Tickets and What is the Best Multi-Stage Architecture for Object Recognition?. Another paper, describing the second paper:

We also compare to random, untrained weights because Jarrett et al. (2009) showed — quite strikingly — that the combination of random convolutional filters, rectification, pooling, and local normalization can work almost as well as learned features. They reported this result on relatively small networks of two or three learned layers and on the smaller Caltech-101 dataset (Fei-Fei et al., 2004). It is natural to ask whether or not the nearly optimal performance of random filters they report carries over to a deeper network trained on a larger dataset.

(My interpretation of their results is 'yeah actually randomly initialized convs do pretty well on imagenet'; I remember coming across a paper that answer that question more exactly and getting a clearer 'yes' answer but I can't find it at the moment; I remember them freezing a conv architecture and then only training the fully connected net at the end.)

Why do you doubt this? Are you seeing a bunch of evidence that I'm not? Or are you imagining new architectures that people haven't done these tests for yet / have done these tests and the new architectures fail?

[Maybe your standards are higher than mine--in the DLT paper, they're able to get 65% performance on CIFAR-10 by just optimizing a binary mask on the randomly initialized parameters, which is ok but not good.]

Against GDP as a metric for timelines and takeoff speeds

none capable of accelerating world GWP growth.

Or, at least, accelerating world GWP growth faster than they're already doing. (It's not like the various powers with nukes and bioweapons programs are not also trying to make the future richer than the present.)

Selection vs Control

In my wayward youthformal education, I studied numerical optimization, controls systems, the science of decision-making, and related things, and so some part of me was always irked by the focus on utility functions and issues with them; take this early comment of mine and the resulting thread as an example. So I was very pleased to see a post that touches on the difference between the approaches and the resulting intuitions bringing it more into the thinking of the AIAF.

That said, I also think I've become more confused about what sorts of inferences we can draw from internal structure to external behavior, when there are Church-Turing-like reasons to think that a robot built with mental strategy X can emulate a robot built with mental strategy Y, and both psychology and practical machine learning systems look like complicated pyramids built out of simple nonlinearities that can approximate general functions (but with different simplicity priors, and thus efficiencies). This sort of distinction doesn't seem particularly useful to me from the perspective of constraining our expectations, while it does seem useful for expanding them. [That is, the range of future possibilities seems broader than one would expect if they only thought in terms of selection, or only thought in terms of control.]

Confucianism in AI Alignment

Even if BigCo senior management were virtuous and benevolent, and their workers were loyal and did not game the rules, the poor rules would still cause problems.

If BigCo senior management were virtuous and benevolent, would they have poor rules?

That is to say, when I put my Confucian hat on, the whole system of selecting managers based on a proxy measure that's gameable feels too Legalist. [The actual answer to my question is "getting rid of poor rules would be a low priority, because the poor rules wouldn't impede righteous conduct, but they still would try to get rid of them."]

Like, if I had to point at the difference between the two, the difference is where the put the locus of value. The Confucian ruler is primarily focused on making the state good, and surrounding himself with people who are primarily focused on making the state good. The Legalist ruler is primarily focused on surviving and thriving, and so tries to set up systems that cause people who are primarily focused on surviving and thriving to do the right thing. The Confucian imagines that you can have a large shared value; the Legalist imagines that you will necessarily have many disconnected and contradictory values.

The difference between hiring for regular companies and EA orgs seems relevant. Often, applicants for regular companies want the job, and standard practice is to attempt to trick the company into hiring them, regardless of qualification. Often, applicants for EA orgs only want the job if and only if they're the right person for the job; if I'm trying to prevent asteroids from hitting the Earth (or w/e) and someone else could do a better job of it than I could, I very much want to get out of their way and have them do it instead of me. As you mention in the post, this just means you get rid of the part of interviews where gaming is intentional, and significant difficulty remains. [Like, people will be honest about their weaknesses and try to be honest about their strengths, but accurately measuring those and fit with the existing team remains quite difficult.]

Now, where they're trying to put the locus of value doesn't mean their policy prescriptions are helpful. As I understand the Confucian focus on virtue in the leader, the main value is that it's really hard to have subordinates who are motivated by the common good if you yourself are selfish (both because they won't have your example and because the people who are motivated by the common good will find it difficult to be motivated by working for you).

But I find myself feeling some despair at the prospect of a purely Legalist approach to AI Alignment, because it feels like it is fighting against the AI at every step, instead of being able to recruit it to do some of the work for you, and without that last bit I'm not sure how you get extrapolation instead of interpolation. Like, you can trust the Confucian to do the right thing in novel territory, insofar as you gave them the right underlying principles, and the Confucian is operating at a philosophical level where you can give them concepts like corrigibility (where they not only want to accept correction from you, but also want to preserve their ability to accept correction for you, and preserve their preservation of that ability, and so on) and the map-territory distinction (where they want their sensors to be honest, because in order to have lots of strawberries they need their strawberry-counter to be accurate instead of inaccurate). In Legalism, the hope is that the overseer can stay a step ahead of their subordinate; in Confucianism, the hope is that everyone can be their own overseer.

[Of course, defense in depth is useful; it's good to both have trust in the philosophical competence of the system and have lots of unit tests and restrictions in case you or it are confused.]

"Zero Sum" is a misnomer.

"Completely adversarial" also better captures the strange feature of zero-sum games where doing damage to your opponent, by the nature of it being zero-sum, necessarily means improving your satisfaction, which is a very narrow class of situations.

Load More