From Barriers to Alignment to the First Formal Corrigibility Guarantees
This post summarizes my two related papers that will appear at AAAI 2026 in January: * Part I: Intrinsic Barriers and Practical Pathways for Human-AI Alignment: An Agreement-Based Complexity Analysis (selected for oral presentation) * Part II: Core Safety Values for Provably Corrigible Agents What these papers try to quantify...