Reminds me of a quote from this Paul Christiano post: "It's a solution built to last (at most) until all contemporary thinking about AI has been thoroughly obsoleted...I don’t think there is a strong case for thinking much further ahead than that."

Reply

[-]avturchin5y10

as a weak alignment techniques we might use to bootstrap strong alignment.

Yes, it also reminded me Christiano approach of amplification and distillation.

Reply

[-]Gordon Seidoh Worley5y20

Thanks both! I definitely had the idea that Paul had mentioned something similar somewhere but hadn't made it a top-level concept. I think there's similar echos in how Eliezer talked about seed AI in the early Friendly AI work.

Reply

[-]Rohin Shah5y20

Planned summary for the Alignment Newsletter:

This post distinguishes between three kinds of “alignment”:
1. Not building an AI system at all,
2. Building Friendly AI that will remain perfectly aligned for all time and capability levels,
3. _Bootstrapped alignment_, in which we build AI systems that may not be perfectly aligned but are at least aligned enough that we can use them to build perfectly aligned systems.
The post argues that optimization-based approaches can’t lead to perfect alignment, because there will always eventually be Goodhart effects.

Reply

[-]Gordon Seidoh Worley5y30

Looks good to me! Thanks for planning to include this in the AN!

Reply

[-]Charlie Steiner5y10

I'm still holding out hope for jumping straight to FAI :P Honestly I'd probably feel safer switching on a "big human" than a general CIRL agent that models humans as Boltzmann-rational.

Though on the other hand, does modern ML research already count as trying to use UFAI to learn how to build FAI?

Reply

[-]Gordon Seidoh Worley5y10

Seems like it probably does, but only incidentally.

I instead tend to view ML research as the background over which alignment work is now progressing. That is, we're in a race against capabilities research that we have little power to stop, so our best bets are either that it turns out capabilities are about to hit the upper inflection point of an S-curve, buying us some time, or that the capabilities can be safely turned to helping us solve alignment.

I do think there's something interesting about a direction not considered in this post related to intelligence enhancement of humans and human emulations (ems) as a means to working on alignment, but I think realistically current projections of AI capability timelines suggest they're unlikely to have much opportunity for impact.

Reply

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

6

Bootstrapped Alignment

6