Peter Hroššo

Posts

Sorted by New

Comments

Gradient hacking
If your model is deceptive, though, then it might know all of that

Could you please describe your intuition behind how the model could know the meta-optimizer is going to perform checks on deceptive behavior?