Historically you very clearly thought that a major part of the problem is that AIs would not understand human concepts and preferences until after or possibly very slightly before achieving superintelligence. This is not how it seems to have gone.
Everyone agrees that you assumed superintelligence would understand everything humans understand and more. The dispute is entirely about the things that you encounter before superintelligence. In general it seems like the world turned out much more gradual than you expected and there's information to be found in what capabilities emerged sooner in the process.
When considering an embedder , in universe , in response to which SADT picks policy , I would be tempted to apply the following coherence condition:
(all approximately of course)
I'm not sure if this would work though. This is definitely a necessary condition for reasonable counterfactuals, but not obviously sufficient.
By censoring I mean a specific technique for forcing the consistency of a possibly inconsistent set of axioms.
Suppose you have a set of deduction rules over a language . You can construct a function that takes a set of sentences and outputs all the sentences that can be proved in one step using and the sentences in . You can also construct a censored by letting .
To pick a toy example, you can use text as a bottleneck to force systems to "think out loud" in a way which will be very directly interpretable by a human reader, and because language understanding is so rich this will actually be competitive with other approaches and often superior.
I'm sure you can come up with more ways that the existence of software that understands language and does ~nothing else makes getting computers to do what you mean easier than if software did not understand language. Please think about the problem for 5 minutes. Use a clock.