LessWrong team member / moderator. I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.
I previously had had a cruder model of "There's an AI capabilities fuel tank and an AI alignment fuel tank. Many research items fill up both at the same time, but in different ratios. If you fill up the capabilities tank before the alignment tank, we lose. You want to pursue strategies that cause the alignment tank to get filled up faster than the capabilities tank." (I got this from Andrew Critch during in-person converation)
I like this post for putting forth a higher resolution model, that prompts me to think a bit more specifically about what downstream effects I expect to happen. (though I think the tank model might still be kinda useful as a fast shorthand sometimes)
Curated. I think this post proposes an interesting mechanism of understanding and controlling LLMs. I'm have a lot of uncertainty on how useful this will turn out to be, but the idea seems both interesting and promising and I'd like to see more work exploring the area.
I didn't downvote but didn't upvote and generally wish I had an actual argument to link to when discussing this concept.
I'm also not able to evaluate the object-level of "was this post missing obvious stuff it'd have been good to improve", but, something I want to note about my own guess of how an ideal process would go from my current perspective:
I think it makes more sense to think of posting on LessWrong as "submitting to a journal", than "publishing a finished paper." So, the part where some people then comment "hey, this is missing X" is more analogous to the thing where you submit to peer review and they say "hey, you missed X", then publishing a finished paper in a journal and it missing X.
I do think a thing LessWrong is missing (or, doesn't do a good enough job at) is a "here is the actually finished stuff". I think the things that end up in the Best of LessWrong, after being subjected to review, are closer to that, but I think there's room to improve that more, and/or have some kind of filter for stuff that's optimized to meet academic-expectations-in-particular.
note: I tagged this "Infrabayesianism" but wasn't actually sure whether it was or not according to you.
Curated. On one hand, folks sure have spent a long time trying to hash out longstanding disagreements, and I think it's kinda reasonable to not feel like that's a super valuable thing to do more of.
On the other hand... man, sure seems scary to me that we still have so many major disagreements that we haven't been able to resolve.
I think this post does a particularly exemplary job of exploring some subtle disagreements from a procedural level: I like that Holden makes a pretty significant attempt to pass Nate's Ideological Turing Test, flags which parts of the post represent which person's views, flags possible cruxes, and and explores what future efforts (both conceptual and empirical) might further resolve the disagreement.
It's... possible this is actually the single best example of a public doublecrux writeup that I know of?
Anyways, thanks Holden and Nate for taking the time to do this, both for the object level progress and for serving as a great example.
However, if your post doesn't look like a research article, you might have to format it more like one (and even then it's not guaranteed to get in, see this comment thread).
I interpreted this as saying something superficial about style, rather than "if your post does not represent 100+ hours of research work it's probably not a good fit for archive." If that's what you meant I think the post could be edited to make that more clear.
If the opening section of your essay made it more clear which posts it was talking about I'd probably endorse it (although I'm not super familiar with the nuances of arXiv gatekeeping so am mostly going off the collective response in the comment section)
meta note on tagging:
This post seemed to be on a topic that... surely there should be commonly used LW concept for, but I couldn't think of it. I tagged it "agent foundations" but feel like there should be something more specific.