Peter Barnett

Researcher at MIRI

EA and AI safety

Wiki Contributions


I'm confused here. It seems to me that if your AI normally does evil things and then sometimes (in certain situations) does good things, I would not call it "aligned", and certainly the alignment is not stable (because it almost never takes "good" actions).  Although this thing is also not robustly "misaligned" either.

(I don't mean to dogpile)
I think that selection is the correct word, and that it doesn't really seem to be smuggling in incorrect connections to evolution. 

We could imagine finding a NN that does well according to a loss function by simply randomly initializing many many NNs, and then keeping the one that does best according to the loss function. I think this process would accurately be described as selection; we are literally selecting the model which does best. 

I'm not claiming that SGD does this[1], just giving an example of a method to find a low-loss parameter configuration which isn't related to evolution, and is (in my opinion) best described as "selection".

  1. ^

    Although "Is SGD a Bayesian sampler? Well, almost" does make a related claim.

So could an AI engineer create an AI blob of compute the same size as the brain, with its same structural parameters, feed it the same training data, and get the same result ("don't steal" rather than "don't get caught")?

There is a disconnect with this question. 

I think Scott is asking “Supposing an AI engineer could create something that was effectively a copy of a human brain and the same training data, then could this thing learn the “don’t steal” instinct over the “don’t get caught” instinct?” 
Eliezer is answering “Is an AI engineer able to create a copy of the human brain, provide it with the same training data a human got, and get the “don’t steal” instinct?”