Don't Influence the Influencers!
This post was written under Evan Hubinger’s direct guidance and mentorship, as a part of the Stanford Existential Risks Institute ML Alignment Theory Scholars (MATS) program. TL;DR: AGI is likely to turn out unsafe. One likely way that can happen is that it fools us into thinking it is safe....
Dec 19, 202114