AI ALIGNMENT FORUM
AF

Liam Donovan
Ω3000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
What are the differences between all the iterative/recursive approaches to AI alignment?
Liam Donovan6y20

Huh, I thought that all amplification/distillation procedures were intended as a way to approximate HCH, which is itself a tree. Can you not meaningfully discuss "this amplification procedure is like an n-depth approximation of HCH at step x", for any amplification procedure?

For example, the internal structure of the distilled agent described in Christiano's paper is unlikely to look anything like a tree. However, my (potentially incorrect?) impression is that the agent's capabilities at step x are identical to an HCH tree of depth x if the underlying learning system is arbitrarily capable.


It's possible that I'm not understanding the difference between "depth", "tree-based" and "recursion" in this context

Reply
What are the differences between all the iterative/recursive approaches to AI alignment?
Liam Donovan6y10

Huh, what would you recommend I do to reduce my uncertainty around meta-execution (e.g. "read x", "ask about it as a top level question", etc)?

Reply
What are the differences between all the iterative/recursive approaches to AI alignment?
Liam Donovan6y10

Is this necessarily true? It seems like this describes what Christiano calls "delegation" in his paper, but wouldn't apply to IDA schemes with other capability amplification methods (such as the other examples in the appendix of "Capability Amplification").

Reply
What are the differences between all the iterative/recursive approaches to AI alignment?
Liam Donovan6y10

I found this immensely helpful overall, thank you!

However, I'm still somewhat confused by meta-execution. Is it essentially a more sophisticated capability amplification strategy that replaces the role filled by "deliberation" in Christiano's IA paper?

Reply
Contest: $1,000 for good questions to ask to an Oracle AI
Liam Donovan6y20

Two basic questions I couldn't figure out (sorry):

Can you use a different oracle for every subquestion? If you can, how would this affect the concern Wei_Dai raises?

If we know the oracle is only optimizing for the specified objective function, are mesa-optimisers still a problem for the proposed system as a whole?

Reply
No wikitag contributions to display.
No posts to display.