Oliver Habryka

Coding day in and out on LessWrong 2.0


Are we in an AI overhang?

Promoted to curated: I think the question of whether we are in an AI overhang is pretty obviously relevant to a lot of thinking about AI Risk, and this post covers the topic quite well. I particularly liked the use of a lot of small fermi estimate, and how it covered a lot of ground in relatively little writing. 

I also really appreciated the discussion in the comments, and felt that Gwern's comment on AI development strategies in particular help me build a much map of the modern ML space (though I wouldn't want it to be interpreted as a complete map of a space, just a kind of foothold that helped me get a better grasp on thinking about this). 

Most of my immediate critiques are formatting related. I feel like the listed section could have used some more clarity, maybe by bolding the name for each bullet point consideration, but it flowed pretty well as is. I was also a bit concerned about there being some infohazard-like risks from promoting the idea of being in an AI overhang too much, but after talking to some more people about it, and thinking for a bit about it, decided that I don't think this post adds much additional risk (e.g. by encouraging AI companies to act on being in an overhang and try to drastically scale up models without concern for safety).

Why is pseudo-alignment "worse" than other ways ML can fail to generalize?

(Really minor formatting nitpick, but it's the kind of thing that really trips me up while reading, but you forgot a closing parenthesis somewhere in your comment)

[AN #107]: The convergent instrumental subgoals of goal-directed agents

Mod Note: Because of a bug in our RSS Import script, this post didn't import properly last week, which is why you are seeing it now. Sorry for the confusion and delay!

Specification gaming: the flip side of AI ingenuity

Note: This post was originally posted to the DeepMind blog, so presumably the target audience is a broader audience of Machine Learning researchers and people in that broad orbit. I pinged Vika about crossposting it because it also seemed like a good reference post that I expected would get linked to a bunch more frequently if it was available on LessWrong and the AIAF. 

Problem relaxation as a tactic

Promoted to curated: This is a technique I've seen mentioned in a bunch of places, but I haven't seen a good writeup for it, and I found it quite valuable to read. 

Writeup: Progress on AI Safety via Debate

Promoted to curated: I've been thinking about this post a lot since it has come out, and it is just a really good presentation of all the research on AI Alignment via debate. It is quite long, which has made me hesitant about curating it for a while, but I now think curating it is the right choice. I also think while it is reasonably technical, it's approachable enough that the basic idea of it should be understandable by anyone giving it a serious try.

Thinking About Filtered Evidence Is (Very!) Hard

Promoted to curated: This post is a bit more technical than the usual posts we curate, but I think it is still quite valuable to read for a lot of people, since it's about a topic that has already received some less formal treatment on LessWrong. 

I also am very broadly excited about trying move beyond a naive bayesianism paradigm, and felt like this post helped me significantly in understanding what that would look like. 

Cortés, Pizarro, and Afonso as Precedents for Takeover

Promoted to curated: I really like this post. I already linked to it two times, and it clearly grounds some examples that I've seen people use informally in AI Alignment discussions in a way that will hopefully create a common reference, and allow us to analyze this kind of argument in much more detail. 

Reframing Impact

Promoted to curated: I really liked this sequence. I think in many ways it has helped me think about AI Alignment from a new perspective, and I really like the illustrations and the way it was written, and how it actively helped me along the way thing actively about the problems, instead of just passively telling me solutions.

Now that the sequence is complete, it seemed like a good time to curate the first post in the sequence. 

Load More