Welcome to the Technical AI Safety Podcast, the show where I interview computer scientists about their papers. This month I covered Optimal Policies Tend to Seek Power, which is closely related to Seeking Power is Often Robustly Instrumental in MDPs which is a part of the Reframing Impact sequence and was recently a part of the 2019 review.

The point of the show is to make papers more parsable, the interview features a detailed walkthrough padded on either side by discussion of where the work came from and where it's going.

I had a lot of fun doing this month's episode, a tricky paper to wrap my head around but very rewarding. Do let me know if you have trouble finding it on your favorite podcast app, thanks!

Show notes:

With Alex Turner

Feedback form

Request an episode

Optimal Policies Tend to Seek Power

by Alexander Matt Turner, Logan Smith, Rohin Shah, Andrew Critch, Prasad Tadepalli

Abstract:

Some researchers have speculated that capable reinforcement learning agents are often incentivized to seek resources and power in pursuit of their objectives. While seeking power in order to optimize a misspecified objective, agents might be incentivized to behave in undesirable ways, including rationally preventing deactivation and correction. Others have voiced skepticism: human power-seeking instincts seem idiosyncratic, and these urges need not be present in reinforcement learning agents. We formalize a notion of power within the context of Markov decision processes. With respect to a class of neutral reward function distributions, we provide sufficient conditions for when optimal policies tend to seek power over the environment.

What Counts as Defection?

Non-Obstruction

New Comment