I'm not sure you have addressed Richard's point -- if you keep your current definition of outer alignment, then memorizing the answers to the finite set of data is always a way to score perfect loss, but intuitively doesn't seem like it would be intent aligned. And if memorization were never intent aligned, then your definition of outer alignment would be impossible.

Reply

Does the lottery ticket hypothesis suggest the scaling hypothesis?

Jack R5y20

When the network is randomly initialized, there is a sub-network that is already decent at the task.

From what I can tell, the paper doesn't demonstrate this--i.e. I don't think they ever test the performance of a sub-network with random weights (rather they test the performance of a subnetwork after training only the subnetwork). Though maybe this isn't what you meant, in which case you can ignore me :)

Reply