All Posts

Sorted by Magic (New & Upvoted)

Tuesday, July 27th 2021
Tue, Jul 27th 2021

Shortform
4Alex Turner1dMy power-seeking theorems [https://www.lesswrong.com/s/fSMbebQyR4wheRrvk] seem a bit like Vingean reflection [https://www.lesswrong.com/posts/iWwaJ5wPGLZJWjPAL/vingean-reflection-reliable-reasoning-for-self-improving] . In Vingean reflection, you reason about an agent which is significantly smarter than you: if I'm playing chess against an opponent who plays the optimal policy for the chess objective function, then I predict that I'll lose the game. I predict that I'll lose, even though I can't predict my opponent's (optimal) moves - otherwise I'd probably be that good myself. My power-seeking theorems show that most objectives have optimal policies which e.g. avoid shutdown and survive into the far future, even without saying what particular actions these policies take to get there. I may not even be able to compute a single optimal policy for a single non-trivial objective, but I can still reason about the statistical tendencies of optimal policies.
1
Wiki/Tag Page Edits and Discussion

Monday, July 26th 2021
Mon, Jul 26th 2021

Shortform
3Paul Christiano2dSuppose I am interested in finding a program M whose input-output behavior has some property P that I can probabilistically check relatively quickly (e.g. I want to check whether M implements a sparse cut of some large implicit graph). I believe there is some simple and fast program M that does the trick. But even this relatively simple M is much more complex than the specification of the property P. Now suppose I search for the simplest program running in time T that has property P. If T is sufficiently large, then I will end up getting the program "Search for the simplest program running in time T' that has property P, then run that." (Or something even simpler, but the point is that it will make no reference to the intended program M since encoding P is cheaper.) I may be happy enough with this outcome, but there's some intuitive sense in which something weird and undesirable has happened here (and I may get in a distinctive kind of trouble if P is an approximate evaluation). I think this is likely to be a useful maximally-simplified example to think about.
5
Wiki/Tag Page Edits and Discussion

Sunday, July 25th 2021
Sun, Jul 25th 2021

No posts for July 25th 2021

Load More Days