I have some rough intuitions about the purpose of MIRI's agent foundations agenda, and I'd like to share them here. (Note: I have not discussed these with MIRI, and these should not be taken to be representative of MIRI's views.)
I think there's a common misconception that the goal of agent foundations is to try building an AGI architected with a decision theory module, a logical induction module, etc. In my mind, this is completely not the point, and my intuitions say that approach is doomed to fail.
I interpret agent foundations as being more about providing formal specifications of metaphilosophical competence, to:
I want to explore logical induction as a case study. I think the important part about logical induction is the logical induction criterion, not the algorithm implementing it. I've heard the implementation criticized for being computationally intractable, but I see its primary purpose as showing the logical induction criterion to be satisfiable at all. This elevates the logical induction criterion over all the other loose collections of desiderata that may or may not be satisfiable, and may or may not capture what we mean by logical uncertainty. If we were to build an actual aligned AGI, I would expect its reasoning process to satisfy the logical induction criterion, but not look very much like the algorithm presented in the logical induction paper.
I also think the logical induction criterion provides an exact formalization—a necessary AND sufficient condition—of what it means to not get stuck in any epistemic pits in the limit. (The gist of this intuition: epistemic pits you're stuck in forever correspond exactly to patterns in the market that a trader could exploit forever, and make unbounded profits from.) This lets us formalize the question "Does X reasoning process avoid permanently falling into epistemic pits?" into "Does X reasoning process satisfy the logical induction criterion?"