[AN #159]: Building agents that know how to experiment, by training on procedurally generated games

[-]Sammy Martin4y40

- They will not work in any environment outside of XLand (unless that environment looks very very similar to XLand).
In particular, I reject the idea that these agents have learned “general strategies for problem solving” or something like that, such that we should expect them to work in other contexts as well, perhaps with a little finetuning. I think they have learned general strategies for solving a specific class of games in XLand.

Strongly agree with this, although with the caveat that it's deeply impressive progress compared to the state of the art in RL research in 2017, where getting an agent to learn to play ten games with a noticeable decrease in performance during generalization was impressive. This is generalization over a few million related games that share a common specification language, which is a big step up from 10 but still a fair way off infinity (i.e. general problem-solving).

It may well be worth having a think about what AI that's human level on language understanding, image recognition and some other things, but significantly below human on long-term planning would be capable of, what risks it may present. (Is there any existing writing on this sort of 'idiot savant AI', possibly under a different name?)

It seems to be the view of many researchers that long-term planning will likely be the last obstacle to fall, and that view has been borne out by progress on e.g. language understanding in GPT-3. I don't think this research changes that view much, although I suppose I should update slightly towards long-term planning being easier than I thought.

[-]Daniel Kokotajlo4y20

I wonder if grokking is evidence for, or against, the Mignard et al view that SGD on big neural nets is basically a faster approximation of rejection sampling. Here's an argument that it's evidence against:

--Either the "grokked algorithm circuit" is simpler, or not simpler, than the "memorization circuit."

--If it's simpler, then rejection sampling would reach the grokked algorithm circuit prior to reaching the memorization circuit, which is not what we see.

--If it's not simpler, then rejection sampling would briefly stumble across the grokked algorithm circuit eventually but immediately return to the memorization circuit.

OTOH maybe Mignard could reply that indeed, for small neural nets like these ones SGD is not merely an approximation of rejection sampling but rather meanders a lot, creating a situation where more complex circuits (the memorization ones) can have broader basins of attraction than simpler circuits (the grokked algorithm). But eventually SGD randomly jumps its way to the simpler circuit and then stays there. idk.

[-]Rohin Shah4y*40

I feel like everyone is taking the SGD = rejection sampling view way too seriously. From the Mingard et al paper:

We argue here that the inductive bias found in DNNs trained by SGD or related optimisers,
is, to first order, determined by the parameter-function map of an untrained DNN. While on
a log scale we find PSGD(f|S) ≈ PB(f|S) there are also measurable second order deviations
that are sensitive to hyperparameter tuning and optimiser choice.

The first order effect is what lets you conclude that when you ask GPT-3 a novel question like "how many bonks are in a quoit", that it has never been trained on, you can expect that it won't just start stringing characters together in a random way, but will probably respond with English words.

The second order effects could be what tells you whether or not it is going to respond with "there are three bonks in a quoit" or "that's a nonsense question". (Or maybe not! Maybe random sampling has a specific strong posterior there, and SGD does too! But it seems hard to know one way or the other.) Most alignment-relevant properties seem like they are in this class.

Grokking occurs in a weird special case where it seems there's ~one answer that generalizes well and has much higher prior, and everything else is orders of magnitude less likely. I don't really see why you should expect that results on MNIST should generalize to this situation.

[-]Daniel Kokotajlo4y40

Thanks! I'm not sure I understand your argument, but I think that's my fault rather than yours, since tbh I don't fully understand the Mingard et al paper itself, only its conclusion.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

11

[AN #159]: Building agents that know how to experiment, by training on procedurally generated games

11

HIGHLIGHTS

NEAR-TERM CONCERNS

RECOMMENDER SYSTEMS

AI GOVERNANCE

OTHER PROGRESS IN AI

MULTIAGENT RL

DEEP LEARNING

NEWS

FEEDBACK

PODCAST