anon1

Comments

Conditions under which misaligned subagents can (not) arise in classifiers

Re: first point, I think this is a difference in intuition about how simple / easy to find agents are in search space. My intuition is that they are would be harder to find than regular functions doing something - I think this is generated by a more general intuition that finding a function that does A is easier than finding a function that does both A and B.

Re: second point, I agree - there will be some agents in the search space. Claim 3 is that if claim 1 and 2 are true, then (for the specified type of task) it is very unlikely that the optimization process will find an agent; however, there is still a nonzero probability that it does.