You may have heard a few things about “predictive processing” or “the global neuronal workspace,” and you may have read some of Steve Byrnes’ excellent posts about what’s going on computationally in the human brain. But how does it all fit together? How can we begin to arrive at a...
Considering how much I’ve been using “the intentional stance" in my thinking about the nature of agency and goals and discussions of the matter recently, I figured it would be a good idea to, y’know, actually read what Dan Dennett originally wrote about it. While doing so, I realized that...
Inner alignment and objective robustness have been frequently discussed in the alignment community since the publication of “Risks from Learned Optimization” (RFLO). These concepts identify a problem beyond outer alignment/reward specification: even if the reward or objective function is perfectly specified, there is a risk of a model pursuing a...
(Crossposted from my blog) Throughout my studies in alignment and AI-related existential risks, I’ve found it helpful to build a mental map of the field and how its various questions and considerations interrelate, so that when I read a new paper, a post on the Alignment Forum, or similar material,...