Probably H intends A to achieve a narrow subset of H's goals, but doesn't necessarily want A pursuing them in general.

Similarly, if I have an employee, I may intend for them to do some work-related tasks for me, but I probably don't intend for them to go and look after my parents, even though ensuring my parents are well looked-after is a goal of mine.

[ Question ]

What is the alternative to intent alignment called?

by ricraz 1 min read30th Apr 20203 comments


Paul defines intent alignment of an AI A to a human H as the criterion that A is trying to do what H wants it to do. What term do people use for the definition of alignment in which A is trying to achieve H's goals (whether or not H intends for A to achieve H's goals)?

Secondly, this seems to basically map on to the distinction between an aligned genie and an aligned sovereign. Is this a fair characterisation?

(Intent alignment definition from

New Answer
Ask Related Question
New Comment