If your model is deceptive, though, then it might know all of that
Could you please describe your intuition behind how the model could know the meta-optimizer is going to perform checks on deceptive behavior?
Could you please describe your intuition behind how the model could know the meta-optimizer is going to perform checks on deceptive behavior?