Jack Koch

Top postsTop post

Discussion: Objective Robustness and Inner Alignment Terminology

73Jun 23, 2021

Top postsTop post

Jack Koch

Message

362

134

jbkjr's Shortform

Oct 21, 2025•4

Integrating Three Models of (Human) Cognition

You may have heard a few things about “predictive processing” or “the global neuronal workspace,” and you may have read some of Steve Byrnes’ excellent posts about what’s going on computationally in the human brain. But how does it all fit together? How can we begin to arrive at a...

Nov 23, 2021•40

Grokking the Intentional Stance

Considering how much I’ve been using “the intentional stance" in my thinking about the nature of agency and goals and discussions of the matter recently, I figured it would be a good idea to, y’know, actually read what Dan Dennett originally wrote about it. While doing so, I realized that...

Aug 31, 2021•50

Discussion: Objective Robustness and Inner Alignment Terminology

Jun 23, 2021•73

Empirical Observations of Objective Robustness Failures

Inner alignment and objective robustness have been frequently discussed in the alignment community since the publication of “Risks from Learned Optimization” (RFLO). These concepts identify a problem beyond outer alignment/reward specification: even if the reward or objective function is perfectly specified, there is a risk of a model pursuing a...

Jun 23, 2021•63

Mapping the Conceptual Territory in AI Existential Safety and Alignment

(Crossposted from my blog) Throughout my studies in alignment and AI-related existential risks, I’ve found it helpful to build a mental map of the field and how its various questions and considerations interrelate, so that when I read a new paper, a post on the Alignment Forum, or similar material,...

Feb 12, 2021•15

Top postsTop post

Discussion: Objective Robustness and Inner Alignment Terminology

73Jun 23, 2021

Empirical Observations of Objective Robustness Failures

63Jun 23, 2021

Grokking the Intentional Stance

50Aug 31, 2021

Integrating Three Models of (Human) Cognition

40Nov 23, 2021