Liam Donovan

Posts

Sorted by New

Comments

What are the differences between all the iterative/recursive approaches to AI alignment?

Huh, I thought that all amplification/distillation procedures were intended as a way to approximate HCH, which is itself a tree. Can you not meaningfully discuss "this amplification procedure is like an n-depth approximation of HCH at step x", for any amplification procedure?

For example, the internal structure of the distilled agent described in Christiano's paper is unlikely to look anything like a tree. However, my (potentially incorrect?) impression is that the agent's capabilities at step x are identical to an HCH tree of depth x if the underlying learning system is arbitrarily capable.


It's possible that I'm not understanding the difference between "depth", "tree-based" and "recursion" in this context

What are the differences between all the iterative/recursive approaches to AI alignment?

Huh, what would you recommend I do to reduce my uncertainty around meta-execution (e.g. "read x", "ask about it as a top level question", etc)?

What are the differences between all the iterative/recursive approaches to AI alignment?

Is this necessarily true? It seems like this describes what Christiano calls "delegation" in his paper, but wouldn't apply to IDA schemes with other capability amplification methods (such as the other examples in the appendix of "Capability Amplification").

What are the differences between all the iterative/recursive approaches to AI alignment?

I found this immensely helpful overall, thank you!

However, I'm still somewhat confused by meta-execution. Is it essentially a more sophisticated capability amplification strategy that replaces the role filled by "deliberation" in Christiano's IA paper?

Contest: $1,000 for good questions to ask to an Oracle AI

Two basic questions I couldn't figure out (sorry):

Can you use a different oracle for every subquestion? If you can, how would this affect the concern Wei_Dai raises?

If we know the oracle is only optimizing for the specified objective function, are mesa-optimisers still a problem for the proposed system as a whole?