643

AI ALIGNMENT FORUM
AF

642

TsviBT's Shortform

by TsviBT
16th Jun 2024
1 min read
137

4

This is a special post for quick takes by TsviBT. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
TsviBT's Shortform
21TsviBT
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 11:40 PM
[-]TsviBT1y2127

An important thing that the AGI alignment field never understood:

Reflective stability. Everyone thinks it's about, like, getting guarantees, or something. Or about rationality and optimality and decision theory, or something. Or about how we should understand ideal agency, or something.

But what I think people haven't understood is

  1. If a mind is highly capable, it has a source of knowledge.
  2. The source of knowledge involves deep change.
  3. Lots of deep change implies lots of strong forces (goal-pursuits) operating on everything.
  4. If there's lots of strong goal-pursuits operating on everything, nothing (properties, architectures, constraints, data formats, conceptual schemes, ...) sticks around unless it has to stick around.
  5. So if you want something to stick around (such as the property "this machine doesn't kill all humans") you have to know what sort of thing can stick around / what sort of context makes things stick around, even when there are strong goal-pursuits around, which is a specific thing to know because most things don't stick around.
  6. The elements that stick around and help determine the mind's goal-pursuits have to do so in a way that positively makes them stick around (reflective stability of goals).

There's exceptions and nuances and possible escape routes. And the older Yudkowsky-led research about decision theory and tiling and reflective probability is relevant. But this basic argument is in some sense simpler (less advanced, but also more radical ("at the root")) than those essays. The response to the failure of those essays can't just be to "try something else about alignment"; the basic problem is still there and has to be addressed.

(related elaboration: https://tsvibt.blogspot.com/2023/01/a-strong-mind-continues-its-trajectory.html https://tsvibt.blogspot.com/2023/01/the-voyage-of-novelty.html )

Reply954
Moderation Log
More from TsviBT
View more
Curated and popular this week
1Comments