Oliver Habryka

Coding day in and out on LessWrong 2.0


AGI safety from first principles: Introduction

Promoted to curated: I really enjoyed reading through this sequence. I have some disagreements with it, but overall it's one of the best plain language introductions to AI safety that I've seen, and I expect I will link to this as a good introduction many times in the future. I was also particularly happy with how the sequence bridged and synthesized a number of different perspectives that usually feel in conflict with each other.

My computational framework for the brain

Promoted to curated: This kind of thinking seems both very important, and also extremely difficult. I do think that trying to understand the underlying computational structure of the brain is quite useful for both thinking about Rationality and thinking about AI and AI Alignment, though it's also plausible to me that it's hard enough to get things right in this space that in the end overall it's very hard to extract useful lessons from this. 

Despite the difficulties I expect in this space, this post does strike me as overall pretty decent and to at the very least open up a number of interesting questions that one could ask to further deconfuse oneself on this topic. 

My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda

Promoted to curated! I held off on curating this post for a while, first because it's long and it took me a while to read through it, and second because we already had a lot of AI Alignment posts in the curation pipeline, and I wanted to make sure we have some diversity in our curation decisions. But overall, I really liked this post, and also want to mirror Rohin's comment in that I found this version more useful than the version where you got everything right, because this way I got to see the contrast between your interpretation and Paul's responses, which feels like it helped me locate the right hypothesis more effective than either would have on its own (even if more fleshed out). 

Comparing Utilities

Yep, fixed. Thank you!

Judging from the URL of those links, those images were hosted on a domain that you could access, but others could not, namely they were stored as Gmail image attachments, to which of course you as the recipient have access, but random LessWrong users do not. 

Comparing Utilities

Oh no! The two images starting from this point are broken for me: 

Updates and additions to "Embedded Agency"

Promoted to curated: These additions are really great, and they fill in a lot of the most confusing parts of the original Embedded Agency sequence, which was already one of my favorite pieces of content on all of Lesswrong. So it seems fitting to curate this update to it, which improves it even further. 

Radical Probabilism

Promoted to curated: This post is answering (of course not fully, but in parts) what seems to me one of the most important open questions in theoretical rationality, and I think does so in a really thorough and engaging way. It also draws connections to a substantial number of other parts of your and Scott's work in a way that has helped me understand those much more thoroughly. 

I am really excited about this post. I kind of wish I could curate it two or three times because I do really want a lot of people to have read this, and expect that it will change how I think about a substantial number of topics.

Looking for adversarial collaborators to test our Debate protocol

This sounds fun! I probably won't have enough time to participate, but I do wish I had enough time.

Will OpenAI's work unintentionally increase existential risks related to AI?

I much prefer Rohin's alternative version of: "Are OpenAI's efforts to reduce existential risk counterproductive?". The current version does feel like it screens off substantial portions of the potential risk.

Are we in an AI overhang?

Promoted to curated: I think the question of whether we are in an AI overhang is pretty obviously relevant to a lot of thinking about AI Risk, and this post covers the topic quite well. I particularly liked the use of a lot of small fermi estimate, and how it covered a lot of ground in relatively little writing. 

I also really appreciated the discussion in the comments, and felt that Gwern's comment on AI development strategies in particular help me build a much map of the modern ML space (though I wouldn't want it to be interpreted as a complete map of a space, just a kind of foothold that helped me get a better grasp on thinking about this). 

Most of my immediate critiques are formatting related. I feel like the listed section could have used some more clarity, maybe by bolding the name for each bullet point consideration, but it flowed pretty well as is. I was also a bit concerned about there being some infohazard-like risks from promoting the idea of being in an AI overhang too much, but after talking to some more people about it, and thinking for a bit about it, decided that I don't think this post adds much additional risk (e.g. by encouraging AI companies to act on being in an overhang and try to drastically scale up models without concern for safety).

Load More