This is a special post for short-form writing by Tamsin Leake. Only they can create top-level comments. Comments here also appear on the Shortform Page and All Posts page.
5 comments, sorted by Click to highlight new comments since: Today at 1:57 PM

an approximate illustration of QACI:

Nice graphic!

What stops e.g. "QACI(expensive_computation())" from being an optimization process which ends up trying to "hack its way out" into the real QACI?

nothing fundamentally, the user has to be careful what computation they invoke.

That... seems like a big part of what having "solved alignment" would mean, given that you have AGI-level optimization aimed at (indirectly via a counter-factual) evaluating this (IIUC).

one solution to this problem is to simply never use that capability (running expensive computations) at all, or to not use it before the iterated counterfactual researchers have developed proofs that any expensive computation they run is safe, or before they have very slowly and carefully built dath-ilan-style corrigible aligned AGI.