Regarding 1, it either seems like
a) There are true adversarial examples for human values, situations where our values misbehave and we have no way of ever identifying that, in which case we have no hope of solving this problem, because solving it would mean we are in fact able to identify the adversarial examples.
b) Humans are actually immune to adversarial examples, in the sense that we can identify the situations in which our values (or rather, a subset of them) would misbehave (like being addicted to social medial), such that our true, complete values never do, and an AI that accurately models humans would also have such immunity.
One thing that's bothering me is... Google/DeepMind aren't stupid. The transformer model was invented at Google. What has stopped them from having *already* trained such large models privately? GPT-3 isn't that large an evidence for the effectiveness of scaling transformer models; GPT-2 was already a shock and caused huge public commotion. And in fact, if you were close to building an AGI, it would make sense for you not to announce this to the world, specially as open research that anyone could copy/reproduce, for obvious safety and economi... (read more)
As far as I can tell, this is what is going on: they do not have any such thing, because GB and DM do not believe in the scaling hypothesis the way that Sutskever, Amodei and others at OA do.
GB is entirely too practical and short-term focused to dabble in such esoteric & expensive speculation, although Quoc's group occasionally surprises you. They'll dabble in something like GShard, but mostly because they expect to be likely to be able to deploy it or something like it to production in Google Translate.
DM (particularly Hassabis, I'm not sure about Leg... (read more)