Tamsin Leake

hi! i'm tammy :3

i research the QACI plan for formal-goal AI alignment at orthogonal.

check out my blog and my twitter.

Wiki Contributions


commenting on this post because it's the latest in the sequence; i disagree with the premises of the whole sequence. (EDIT: whoops, the sequence posts in fact discuss those premises so i probably should've commented on those. ohwell.)

the actual, endorsed, axiomatic (aka terminal aka intrinsic) values we have are ones we don't want to change, ones we don't want to be lost or modified over time. what you call "value change" is change in instrumental values.

i agree that, for example, our preferences about how to organize the society we live in should change over time. but that simply means that our preference about society aren't terminal values, and our terminal values on this topic are meta-values about how other (non-terminal) values should change.

these meta-values, and other terminal values, are values that we should not want changed or lost over time.

in actuality, people aren't coherent agents enough to have immutable terminal values; they have value drift and confusion about values and they don't distinguish (or don't distinguish well) between terminal and instrumental values in their mind.

but we should want to figure out what our axiomatic values are, and for those to not be changed at all. and everything else being instrumental to that, we do not have to figure out alignment with regards to instrumental values, only axiomatic values.

one solution to this problem is to simply never use that capability (running expensive computations) at all, or to not use it before the iterated counterfactual researchers have developed proofs that any expensive computation they run is safe, or before they have very slowly and carefully built dath-ilan-style corrigible aligned AGI.

nothing fundamentally, the user has to be careful what computation they invoke.

an approximate illustration of QACI: