This post is sort of an intermediate between parts 1 and 2 of the sequence. It makes three points that I think people tend to get wrong. 1. Factored Cognition is about reducing hard problems to human judgment to achieve outer alignment. It's possible to lose sight of why Factored...
(This post is part of a sequence that's meant to be read in order; see the preface.) Post #1 was about developing and justifying a formalism for Factored Cognition. Now that we have this formalism, this post is about doing as much with it as possible. 1. Debate Trees Recall...
(This post is part of a sequence that's meant to be read in order; see the preface.) 1. HCH and Ideal Debate Recall from post #-2 that we have two perspectives on stock IDA.[1] One is that of a human with access to a model, the other is that of...
Factored Cognition is primarily studied by Ought, the same organization that was partially credited for implementing the interactive prediction feature. Ought is an organization with at least five members who have worked on the problem for several years. I am a single person who just finished a master's degree. The...
1. The Principle Suppose you have some difficult cognitive problem you want to solve. What is the difference between (1) making progress on the problem by thinking about it for an hour and (2) solving a well-defined subproblem whose solution is useful for the entire problem? (Finding a good characterization...
This post is about two proposals for aligning AI systems in a scalable way: * Iterated Distillation and Amplification (often just called 'Iterated Amplification'), or IDA for short,[1] is a proposal by Paul Christiano. * Debate is an IDA-inspired proposal by Geoffrey Irving. This post is written to be as...
(This is an unofficial explanation of Inner Alignment based on the Miri paper Risks from Learned Optimization in Advanced Machine Learning Systems (which is almost identical to the LW sequence) and the Future of Life podcast with Evan Hubinger (Miri/LW). It's meant for anyone who found the sequence too long/challenging/technical...