Oliver Habryka

Coding day in and out on LessWrong 2.0

Oliver Habryka's Comments

Specification gaming: the flip side of AI ingenuity

Note: This post was originally posted to the DeepMind blog, so presumably the target audience is a broader audience of Machine Learning researchers and people in that broad orbit. I pinged Vika about crossposting it because it also seemed like a good reference post that I expected would get linked to a bunch more frequently if it was available on LessWrong and the AIAF. 

Problem relaxation as a tactic

Promoted to curated: This is a technique I've seen mentioned in a bunch of places, but I haven't seen a good writeup for it, and I found it quite valuable to read. 

Writeup: Progress on AI Safety via Debate

Promoted to curated: I've been thinking about this post a lot since it has come out, and it is just a really good presentation of all the research on AI Alignment via debate. It is quite long, which has made me hesitant about curating it for a while, but I now think curating it is the right choice. I also think while it is reasonably technical, it's approachable enough that the basic idea of it should be understandable by anyone giving it a serious try.

Thinking About Filtered Evidence Is (Very!) Hard

Promoted to curated: This post is a bit more technical than the usual posts we curate, but I think it is still quite valuable to read for a lot of people, since it's about a topic that has already received some less formal treatment on LessWrong. 

I also am very broadly excited about trying move beyond a naive bayesianism paradigm, and felt like this post helped me significantly in understanding what that would look like. 

Cortés, Pizarro, and Afonso as Precedents for Takeover

Promoted to curated: I really like this post. I already linked to it two times, and it clearly grounds some examples that I've seen people use informally in AI Alignment discussions in a way that will hopefully create a common reference, and allow us to analyze this kind of argument in much more detail. 

Reframing Impact

Promoted to curated: I really liked this sequence. I think in many ways it has helped me think about AI Alignment from a new perspective, and I really like the illustrations and the way it was written, and how it actively helped me along the way thing actively about the problems, instead of just passively telling me solutions.

Now that the sequence is complete, it seemed like a good time to curate the first post in the sequence. 

Writeup: Progress on AI Safety via Debate

You have an entire copy of the post in the commenting guidelines, fyi :)

Oops, sorry. My bad. Fixed.

The Main Sources of AI Risk?

Done! Daniel should now be able to edit the post. 

New paper: The Incentives that Shape Behaviour

Mod note: I edited the abstract into the post, since that makes the paper more easily searchable in the site-search, and also seems like it would help people get a sense of whether they want to click through to the link. Let me know if you want me to revert that. 

New paper: The Incentives that Shape Behaviour

I quite liked this paper, and read through it this morning. It also seems good to link to the accompanying Medium post, which I found a good introduction into the ideas: 

https://medium.com/@RyanCarey/new-paper-the-incentives-that-shape-behaviour-d6d8bb77d2e4?fbclid=IwAR0hHcQTnAtSzxSONWWxw4pYNpiHBAsuaN8DT6GTz7oO1gxvjPc9R8VVhpY

Load More