Oliver Habryka

Coding day in and out on LessWrong 2.0

Oliver Habryka's Comments

Writeup: Progress on AI Safety via Debate

You have an entire copy of the post in the commenting guidelines, fyi :)

Oops, sorry. My bad. Fixed.

The Main Sources of AI Risk?

Done! Daniel should now be able to edit the post. 

New paper: The Incentives that Shape Behaviour

Mod note: I edited the abstract into the post, since that makes the paper more easily searchable in the site-search, and also seems like it would help people get a sense of whether they want to click through to the link. Let me know if you want me to revert that. 

New paper: The Incentives that Shape Behaviour

I quite liked this paper, and read through it this morning. It also seems good to link to the accompanying Medium post, which I found a good introduction into the ideas: 

https://medium.com/@RyanCarey/new-paper-the-incentives-that-shape-behaviour-d6d8bb77d2e4?fbclid=IwAR0hHcQTnAtSzxSONWWxw4pYNpiHBAsuaN8DT6GTz7oO1gxvjPc9R8VVhpY

2019 AI Alignment Literature Review and Charity Comparison

Promoted to curated: This post has been an amazing resource for almost anyone who is working on AI Alignment every year. It also turns out to be one of the most comprehensive bibliographies for AI Alignment research (next to the AI Alignment Newsletter spreadsheet), which in itself is a various resource that I've made use of myself a few times.

Thank you very much for writing this! 

Specification gaming examples in AI

The biggest benefit for me has come from using this list in conversation, when I am trying to explain the basics of AI risk to someone, or am generally discussing the topic. Before this list came out, I would often have to come up with an example of some specification gaming problem on the fly, and even though I would be confident that my example was representative, it couldn't be sure that it actually happened, which often detracted from the point I was trying to make. After this list came out, I just had a pre-cached list of many examples that I could bring up at any point that I knew had actually happened, and where I could easily reference and find the original source if the other person wanted to follow up on that. 

The Credit Assignment Problem

Promoted to curated: It's been a while since this post has come out, but I've been thinking of the "credit assignment" abstraction a lot since then, and found it quite useful. I also really like the way the post made me curious about a lot of different aspects of the world, and I liked the way it invited me to boggle at the world together with you. 

I also really appreciated your long responses to questions in the comments, which clarified a lot of things for me. 

One thing comes to mind for maybe improving the post, though I think that's mostly a difference of competing audiences: 

I think some sections of the post end up referencing a lot of really high-level concepts, in a way that I think is valuable as a reference, but also in a way that might cause a lot of people to bounce off of it (even people with a pretty strong AI Alignment background). I can imagine a post that includes very short explanations of those concepts, or moves them into a context where they are more clearly marked as optional (since I think the post stands well without at least some of those high-level concepts)

Towards a New Impact Measure

This post, and TurnTrout's work in general, have taken the impact measure approach far beyond what I thought was possible, which turned out to be both a valuable lesson for me in being less confident about my opinions around AI Alignment, and valuable in that it helped me clarify and think much better about a significant fraction of the AI Alignment problem. 

I've since discussed TurnTrout's approach to impact measures with many people. 

Load More