Your posts should be on arXiv

[-]habryka3y3250

A while ago I got most of the way to set up a feature on LW/AIAF that would export LW/AIAF posts to a nicely formatted academic-looking PDF that is linkable. I ended up running into a hurdle somewhat close to the end and shelved the feature, but if there is a lot of demand here, I could probably finish up the work, which would make this process even easier.

[-]Jonathan Uesato3y69

A while ago I made a very quick Python script to pull Markdown from LW, then use pandoc to export to a PDF (because I prefer reading physical papers and Latex formatting). I used it somewhat regularly for ~6 months and found that it was good enough for my purposes. I assume the LW developers could write something much better, but I've thrown it into this Github [repo](https://github.com/juesato/lw_pdf_exporter/tree/main) in case it's of help or interest.

[-]Alex_Altair3y39

I would especially especially love it if it popped out a .tex file that I could edit, since I'm very likely to be using different language on LW than I would in a fancy academic paper.

[-]Davidmanheim3y23

Seconding the .tex export, since it's much more useful than just getting a pdf!

[-]Neel Nanda3y21

I would love this! I'm currently paying someone ~$200 to port my grokking post to LaTeX, getting a PDF automatically would be great

[-]Dan H3y2325

I am strongly in favor of our very best content going on arXiv. Both communities should engage more with each other.

As follows are suggestions for posting to arXiv. As a rule of thumb, if the content of a blogpost didn't take >300 hours of labor to create, then it probably should not go on arXiv. Maintaining a basic quality bar prevents arXiv from being overriden by people who like writing up many of their inchoate thoughts; publication standards are different for LW/AF than for arXiv. Even if a researcher spent many hours on the project, arXiv moderators do not want research that's below a certain bar. arXiv moderators have reminded some professors that they will likely reject papers at the quality level of a Stanford undergraduate team project (e.g., http://cs231n.stanford.edu/2017/reports.html); consequently labor, topicality, and conforming to formatting standards is not sufficient for arXiv approval. Usually one's first research project won't be good enough for arXiv. Furthermore, conceptual/philosophical pieces probably should be primarily posted on arXiv's .CY section. For more technical deep learning content, do not make the mistake of only putting it on .AI; these should probably go on .LG (machine learning) or .CV (computer vision) or .CL (NLP). arXiv's .ML section is for more statistical/theoretical machine learning audiences. For content to be approved without complications, it should likely conform to standard (ICLR, ICML, NeurIPS, CVPR, ECCV, ICCV, ACL, EMNLP) formatting. This means automatic blogpost exporting is likely not viable. In trying to diffuse ideas to the broader ML community, we should avoid making the arXiv moderators mad at us.

[-]Davidmanheim3y810

Strongly agree. Three examples of work I've put on Arxiv which originated from the forum, which might be helpful as a touchstone. The first was cited 7 times the first year, and 50 more times since. The latter two were posted last year, and have not been indexed by Google as having been cited yet.

As an example of a technical but fairly conceptual paper, there is the Categorizing Goodhart's law paper. I pushed for this to be a paper rather than just a post, and I think that the resulting exposure was very worthwhile. Scott wrote the original post, though we had discussed Goodhart's Law quite a bit in LA, and I had written about it on Ribbonfarm. I think the post took significantly less than 300 hours of specific work, but much more than that in earlier thinking and discussions. The comments and discussion around the post was probably fifty hours, but extending it to cover the items I disagreed with, writing it in Latex, making diagrams, and polishing the paper took about another hundred hours between myself, Scott, and others who helped with editing and proofreading.

As an example of a large project with a final report, we commissioned an edited summary report / compilation of our MTAIR sequence. This was at least a thousand hours of total work on the project, probably closer to 3,000, including all the work on the project and writing. The marginal work over the project and posts was a couple thousand dollars in editing, probably amounting to a few dozen hours of work. (We did not move it to latex, and the diagrams were screenshots rather than being done nicely in Latex.)

As an example of a conceptual paper that we put on .CY, here is a model of why people are working on agent foundations which Issa initially posted on the alignment forum. I pushed for rewiting and posting it on ArXiv. I guesstimate no more than 50 hours of work by Issa for the original post, and perhaps another 100 hours total writing and editing for ArXiv. It gets less attention than more technical work, but was also less work. I think that's fine, and it's valuable as a more authoritative reference for the arguments than existed previously.

There's also a poorly researched post on "dynamic safety envelopes" which I put together for other reasons, was never on the forum, and which I didn't realized was already superseded by Paul Christiano's and other's work on various topics. In retrospect, this should not have been put on ArXiv.

[-]riceissa3y37

I didn't log the time I spent on the original blog post, and it's kinda hard to assign hours to this since most of the reading and thinking for the post happened while working on the modeling aspects of the MTAIR project. If I count just the time I sat down to write the blog post, I would guess maybe less than 20 hours.

As for the "convert the post to paper" part, I did log that time and it came out to 89 hours, so David's estimate of "perhaps another 100 hours" is fairly accurate.

[-]Davidmanheim3y20

I probably put in an extra 20-60 hours, so the total is probably closer to 150 - which surprises me. I will add that a lot of the conversion time was dealing with writing more, LaTeX figures and citations, which were all, I think, substantive valuable additions. (Changing to a more scholarly style was not substantively valuable, nor was struggling with latex margins and TikZ for the diagrams, and both took some part of the time.)

[-]JanB3y30

Furthermore, conceptual/philosophical pieces probably should be primarily posted on arXiv's .CY section.

As an explanation, because this just took me 5 minutes of search: This is the section "Computers and Society (cs.CY)"

[-]lc3y1110

Arxiv posts should be on LessWrong*

[-]Quintin Pope3y101

I’m actually starting a weekly series that’s basically “collection of arXiv papers that seem important for alignment”.

[-]Dan H3y60

Here's a continual stream of related arXiv papers available through reddit and twitter.

https://www.reddit.com/r/mlsafety/

https://twitter.com/topofmlsafety

[-]RyanCarey3y810

Some reports are not publicised in order not to speed up timelines. And ELK is a bit rambly - I wonder if it will get subsumed by much better content within 2yr. But I do largely agree.

[-]Neel Nanda3y60

It can be as easy as creating a pdf of your post and submitting it (although if your post was written in LaTeX, they'll want the tex file). If everything goes well, this takes less than an hour.

Hilariously, this does not work. I converted my Grokking post to a PDF (very crudely - just printing to PDF) and uploaded that, and it was rejected:

Dear author,
Thank you for submitting your work to arXiv. We regret to inform you that arXiv’s moderators have determined that your submission will not be accepted and made public on[ |http://arxiv.org][arXiv.org|http://arxiv.org].
In this case, our moderators have determined that your submission is a content type that arXiv does not accept:
Blog post

[-]Dan H3y510

I should say formatting is likely a large contributing factor for this outcome. Tom Dietterich, an arXiv moderator, apparently had a positive impression of the content of your grokking analysis. However, research on arXiv will be more likely to go live if it conforms to standard (ICLR, NeurIPS, ICML) formatting and isn't a blogpost automatically exported into a TeX file.

[-]JanB3y57

I agree that formatting is the most likely issue. The content of Neel's grokking work is clearly suitable for arXiv (just very solid ML work). And the style of presentation of the blog post is already fairly similar to a standard paper (e.g. is has an Introduction section, lists contributions in bullet points, ...).

So yeah, I agree that formatting/layout probably will do the trick (including stuff like academic citation style).

[-]JanB3y10

Ah, sorry to hear. I wouldn't have predicted this from reading arXiv's content moderation guidelines.

[-]Richard_Kennaway3y62

Note that arXiv does have some gatekeeping: you must get an "endorsement" before submitting your first paper to any subject area. Details.

[-]JanB3y61

Ah, I had forgotten about this. I'm happy to endorse people or help them find endorsers.

[-]Neel Nanda3y50

Update 2: The nicely LaTeXed version of my Grokking post was also rejected from Arxiv?! I'll revisit this at some point in the next few weeks, but I'm going to give up on this for now. I consider this a mark against putting posts on Arxiv being an easy and fairly low effort thing to do (though plausibly still worth the effort).

[-]Steven Byrnes1mo40

Another data point: when I turned Intro to Brain-Like AGI Safety blog post series into a PDF [via typst—I hired someone to do all the hard work of writing conversion scripts etc.], arXiv rejected it, so I put it on OSF instead. I’m reluctant to speculate on what arXiv didn’t like about it (they didn’t say). Some possibilities are: it seems out-of-place on arXiv in terms of formatting (e.g. single-column, not latex), AND tone (casual, with some funny pictures), AND content (not too math-y, interdisciplinary in a weird way). Probably one or more of those three things. But whatever, OSF seems fine.

[-]Ramana Kumar3y31

Could this be accomplished with literally zero effort from the post-writers? The tasks of identifying which posts are arXiv-worthy, formatting for submission, and doing the submission all seem like they could be done by entities other than the author. The only issue might be in associating the arXiv submitter account with the right person.

[-]JanB3y28

It probably could, although I'd argue that even if not, quite often it would be worth the author's time.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

59

59

Benefits of having posts on arXiv

How much work is it to submit to arXiv?

What types of posts should be on arXiv?

If arXiv doesn't fit