Related to the role of peer review: a lot stuff on LW/AF is relatively exploratory, feeling out concepts, trying to figure out the right frames, etc. We need to be generally willing to ask discuss incomplete ideas, stuff that hasn't yet had the details ironed out. For that to succeed, we need community discussion standards which tolerate a high level of imperfect details or incomplete ideas. I think we do pretty well with this today.
But sometimes, you want to be like "come at me bro". You've got something that you're pretty highly confident is right, and you want people to really try to shoot it down (partly as a social mechanism to demonstrate that the idea is in fact as solid and useful as you think it is). This isn't something I'd want to be the default kind of feedback, but I'd like for authors to be able to say "come at me bro" when they're ready for it, and I'd like for posts which survive such a review to be perceived as more epistemically-solid/useful.
With that in mind, here's a few of my own AF posts which I'd submit for a "come at me bro" review:
For all of these, things like "this frame is wrong" or "this seems true but not useful" are valid objections. I'm not just claiming that the proofs hold.
Steve's big thoughts on alignment in the brain probably deserve a review. Component posts include https://www.lesswrong.com/posts/diruo47z32eprenTg/my-computational-framework-for-the-brain , https://www.lesswrong.com/posts/DWFx2Cmsvd4uCKkZ4/inner-alignment-in-the-brain , https://www.lesswrong.com/posts/jNrDzyc8PJ9HXtGFm/supervised-learning-of-outputs-in-the-brain
Interestingly, I think there aren't any of my posts I should recommend - basically all of them are speculative. However, I did have a post called Gricean communication and meta-preferences that I think is still fairly interesting speculation, and I've never gotten any feedback on it at all, so I'll happily ask for some: https://www.lesswrong.com/posts/8NpwfjFuEPMjTdriJ/gricean-communication-and-meta-preferences .
Suggestion 1: Utility != reward by Vladimir Mikulik. This post attempts to distill the core ideas of mesa alignment. This kind of distillment increases the surface area of AI Alignment, which is one of the key bottlenecks of the area (that is, getting people familiarized with the field, motivated to work on it and with a handle on some open questions to work on). I would like an in-depth review because it might help us learn how to do it better!
Suggestion 2: me and my coauthor Pablo Moreno would be interested in feedback in our post about quantum computing and AI alignment. We do not think that the ideas of the paper are useful in the sense of getting us closer to AI alignment, but I think it is useful to have signpost explaining why avenues that might seem attractive to people coming into the field are not worth exploring, while introducing them to the field in a familiar way (in this case our audience are quantum computing experts). One thing that confuses me is that some people have approached me after publishing the post asking me why I think that quantum computing is useful for AI alignment, so I'd be interested in feedback on what went wrong on the communication process given the deflationary nature of the article.
Great idea! Thanks for doing this!
Unsurprisingly, I'd love it if you reviewed any of my posts.
Since you said "technical," I suggest this one in particular. It's a big deal IMO because Armstrong & Mindermann's argument has been cited approvingly by many people and seems to be still widely regarded as correct, but if I'm right, it's actually a bad argument. I'd love a third perspective on this that helps me figure out what's going on.
More generally I'd recommend sorting all AF posts by karma and reviewing the ones that got the most, since presumably karma correlates with how much people here like the post and thus it's extra important to find flaws in high-karma posts.
I wrote this post as a summary of a paper I published. It didn't get much attention, so I'd be interesting in having you all review it.
To say a little more, I think the general approach I lay out in here for taking towards safety work is worth considering more deeply and points towards a better process for choosing interventions in attempts to build aligned AI. I think what's more important than the specific examples where I apply the method is the method itself, but thus far as best I can tell folks did not much engage with that, so unclear to me if that's because they disagree, think it's too obvious, or what.
I think the generalized insight from Armstrong's no free lunch paper is still underappreciated in that I sometimes see papers that, to me, seem to run up against this and fail to realize there's a free variable in their mechanisms that needs to be fixed if they want them to not go off in random directions.
Another post of mine I'll recommend you:
This is the culmination of a series of post on "formal alignment", where I start out saying "what it would mean to formally state what it would mean to build aligned AI" and then from that try to figure out what we'd have to figure out in order to achieve that.
Over the last year I've gotten pulled in other directions so not pushed this line of research forward much, plus I reached a point with it where it was clear it required different specialization than I have to make additional progress, but I still think it presents a different approach to what others are doing in the space of work towards AI alignment and think you might find it interesting to review (along with the preceding posts in the series) for that reason.