All of Pablo Villalobos's Comments + Replies

So, assuming the neocortex-like subsystem can learn without having a Judge directing it, wouldn't that be the perfect Tool AI? An intelligent system with no intrinsic motivations or goals?

Well, I guess it's possible that such a system would end up creating a mesa optimizer at some point.

2Steve Byrnes3y
A couple years ago I spent a month or two being enamored with the idea of tool AI via self-supervised learning (which is basically what you're talking about, i.e. the neocortex without a reward channel), and I wrote a few posts like In Defense of Oracle ("Tool") AI Research and Self-Supervised Learning and AGI Safety. I dunno, maybe it's still the right idea. But at least I can explain why I personally grew less enthusiastic about it. One thing was, I came to believe (for reasons here, and of course I also have to cite this) that it doesn't buy the safety guarantees that I had initially thought, even with assumptions about the computational architecture. Another thing was, I'm increasingly feeling like people are not going to be satisfied with tool AI. We want our robots! We want our post-work utopia! Even if tool AI is ultimately the right choice for civilization, I don't think it would be possible to coordinate around that unless we had rock-solid proof that agent-y AGI is definitely going to cause a catastrophe. So either way, the right approach is to push towards safe agent-y AGI, and either develop it or prove that it's impossible. More importantly, I stopped thinking that self-supervised tool AI could be all that competent, like competent enough to help solve AI alignment or competent enough that people could plausibly coordinate around never building a more competent AGI. Why not? Because rewards play such a central role in human cognition. I think that every thought we think, we think it in part because it's expected to be higher reward than whatever thought we could have thunk instead. I think of the neocortex as having a bunch of little pieces of generative models that output predictions (see here), and a bit like the scientific method, these things grow in prominence by making correct predictions and get tossed out when they make incorrect predictions. And like the free market, they also grow in prominence by leading to reward, and shrink in prominenc