Oliver Habryka

Coding day in and out on LessWrong 2.0

Oliver Habryka's Comments

2019 AI Alignment Literature Review and Charity Comparison

Promoted to curated: This post has been an amazing resource for almost anyone who is working on AI Alignment every year. It also turns out to be one of the most comprehensive bibliographies for AI Alignment research (next to the AI Alignment Newsletter spreadsheet), which in itself is a various resource that I've made use of myself a few times.

Thank you very much for writing this! 

Specification gaming examples in AI

The biggest benefit for me has come from using this list in conversation, when I am trying to explain the basics of AI risk to someone, or am generally discussing the topic. Before this list came out, I would often have to come up with an example of some specification gaming problem on the fly, and even though I would be confident that my example was representative, it couldn't be sure that it actually happened, which often detracted from the point I was trying to make. After this list came out, I just had a pre-cached list of many examples that I could bring up at any point that I knew had actually happened, and where I could easily reference and find the original source if the other person wanted to follow up on that. 

The Credit Assignment Problem

Promoted to curated: It's been a while since this post has come out, but I've been thinking of the "credit assignment" abstraction a lot since then, and found it quite useful. I also really like the way the post made me curious about a lot of different aspects of the world, and I liked the way it invited me to boggle at the world together with you. 

I also really appreciated your long responses to questions in the comments, which clarified a lot of things for me. 

One thing comes to mind for maybe improving the post, though I think that's mostly a difference of competing audiences: 

I think some sections of the post end up referencing a lot of really high-level concepts, in a way that I think is valuable as a reference, but also in a way that might cause a lot of people to bounce off of it (even people with a pretty strong AI Alignment background). I can imagine a post that includes very short explanations of those concepts, or moves them into a context where they are more clearly marked as optional (since I think the post stands well without at least some of those high-level concepts)

Towards a New Impact Measure

This post, and TurnTrout's work in general, have taken the impact measure approach far beyond what I thought was possible, which turned out to be both a valuable lesson for me in being less confident about my opinions around AI Alignment, and valuable in that it helped me clarify and think much better about a significant fraction of the AI Alignment problem. 

I've since discussed TurnTrout's approach to impact measures with many people. 

Robustness to Scale

I've used the concepts in this post a lot when discussing various things related to AI Alignment. I think asking "how robust is this AI design to various ways of scaling up?" has become one of my go-to hammers for evaluating a lot of AI Alignment proposals, and I've gotten a lot of mileage out of that. 

Toward a New Technical Explanation of Technical Explanation

This post actually got me to understand how logical induction works, and also caused me to eventually give up on bayesianism as the foundation of epistemology in embedded contexts (together with Abram's other post on the untrollable mathematician). 

An Untrollable Mathematician Illustrated

I think this post, together with Abram's other post "Towards a new technical explanation" actually convinced me that a bayesian approach to epistemology can't work in an embedded context, which was a really big shift for me. 

The strategy-stealing assumption

Promoted to curated: I think the strategy-stealing assumption is a pretty interesting conceptual building block for AI Alignment, and I've used it a bunch of times in the last two months. I also really like the structure of this post, and found it both pretty easy to understand, and to cover a lot of ground and considerations. 

Load More