This post was written under Evan Hubinger’s direct guidance and mentorship, as a part of the Stanford Existential Risks Initiative ML Alignment Theory Scholars (MATS) program. TL;DR: The input distribution of a supervised machine learning system is modeled as an unchanging thing, and even in reinforcement learning there are types...