Could you clarify a bit more what you mean when you say "X is inaccessible to the human genome?"
I'm not sure you have addressed Richard's point -- if you keep your current definition of outer alignment, then memorizing the answers to the finite set of data is always a way to score perfect loss, but intuitively doesn't seem like it would be intent aligned. And if memorization were never intent aligned, then your definition of outer alignment would be impossible.
When the network is randomly initialized, there is a sub-network that is already decent at the task.
From what I can tell, the paper doesn't demonstrate this--i.e. I don't think they ever test the performance of a sub-network with random weights (rather they test the performance of a subnetwork after training only the subnetwork). Though maybe this isn't what you meant, in which case you can ignore me :)