Top postsTop post
In Clarifying the Agent-Like Structure Problem (2022), John Wentworth describes a hypothetical instance of what he calls a selection theorem. In Scott Garrabrant's words, the question is, does agent-like behavior imply agent-like architecture? That is, if we take some class of behaving things and apply a filter for agent-like behavior,...
It seems to me that the main goal of quantilization is to reduce the extreme unintended outcomes of maximizing (by sampling from something like a human-learned distribution over actions) while still remaining competitive (by sampling from only the upper quantile of said distribution). But that still leaves open the possibility...