Summary: A Corrigibility method that works for a Pivotal Act AI (PAAI) but fails for a CEV style AI could make things worse. Any implemented Corrigibility method will necessarily be built on top of a set of unexamined implicit assumptions. One of those assumptions could be true for a PAAI, but false for a CEV style AI. The present post outlines one specific scenario where this happens. This scenario involves a Corrigibility method that only works for an AI design, if that design does not imply an identifiable outcome. The method fails when it is applied to an AI design, that does imply an identifiable outcome. When such an outcome does exist,... (read 1502 more words →)
Thank you for engaging. If this was unclear for you, then I'm sure it was also unclear for others.
The post outlined a scenario where a Corrigibility method works perfectly for one type of AI (an AI that does not imply an identifiable outcome, for example a PAAI). The same Corrigibility method fails completely for another type of AI (an AI that does imply an identifiable outcome, for example PCEV). So the second AI, that does have an IO, is indeed not corrigible.
This Corrigibility method leads to an outcome that is massively worse than extinction. This bad outcome is the result of two things being true, (i): the fully Corrigible first AI made... (read 1174 more words →)