Thanks for the comment and I'm glad you like the post :)
On the other topic: I'm sorry, I'm afraid I can't be very helpful here. I'd be somewhat surprised if I'd have had a good answer to this a year ago and certainly don't have one now.
Some cop-out answers:
FWIW, I just had through my docs and found "resources" doc with the following links under corrigiblity:
Clarifying AI alignment
Can corrigibility be learned safely?
Problems with amplification/distillation
The limits of corrigibility
Addressing three problems with counterfactual corrigibility
Not vouching for any of those being the up-to-date or most relevant ones. I'm pretty sure I made this list early on in the process and it probably doesn't represent what I considered the latest Paul-view.