This is a linkpost for https://www.youtube.com/watch?v=TX-U9qwANPQ
Thanks for this! I continue to be excited about the possibility of making large black-box models safer by breaking them down into smaller modular components, and where possible, making those components be more interpretable distillations of the learned logic, like synthesized programs. Related post with further links in comments: https://www.lesswrong.com/posts/xERh9dkBkHLHp7Lg6/making-it-harder-for-an-agi-to-trick-us-with-stvs
I recommend 2x speed.
Summary, made with
(yt-whisper https://www.youtube.com/watch?v=TX-U9qwANPQ | head; echo; echo "What topics does the above presentation cover? What notable results and surprising capabilities?") | openai complete -
using a simple youtube whisper script and the openai-cli python package (haven't checked with openai model it invokes by default).Human residual on this summary: In particular, he goes into some interesting stuff on how you can formally verify the dynamics of a policy with respect to its environmental outcomes, if it's sufficiently simple, and demonstrates such "sufficiently simple" demonstrations for surprisingly complex behaviors. Notably, this involves greatly sparsifying the connectivity of what would have been a transformer.