Universality and the “Filter”
This post was written under Evan Hubinger’s direct guidance and mentorship, as a part of the Stanford Existential Risks Institute ML Alignment Theory Scholars (MATS) program. TL;DR: In this post, I refine universality, in order to explain the "Filter". The Filter is a method of risk analysis for the outputs...
Dec 16, 202110