David Manheim


Modeling Transformative AI Risk (MTAIR)

Wiki Contributions


Is the best way to suggest how to do political and policy strategy, or coordination, to post it publicly on Lesswrong? This seems obviously suboptimal, and I'd think that you should probably ask for feedback and look into how to promote cooperation privately first.

That said, I think everything you said here is correct on an object level, and worth thinking about.

Strongly agree. Three examples of work I've put on Arxiv which originated from the forum, which might be helpful as a touchstone. The first was cited 7 times the first year, and 50 more times since.  The latter two were posted last year, and have not been indexed by Google as having been cited yet. 

As an example of a technical but fairly conceptual paper, there is the Categorizing Goodhart's law paper. I pushed for this to be a paper rather than just a post, and I think that the resulting exposure was very worthwhile. Scott wrote the original post, though we had discussed Goodhart's Law quite a bit in LA, and I had written about it on Ribbonfarm. I think the post took significantly less than 300 hours of specific work, but much more than that in earlier thinking and discussions. The comments and discussion around the post was probably fifty hours, but extending it to cover the items I disagreed with, writing it in Latex, making diagrams, and polishing the paper took about another hundred hours between myself, Scott, and others who helped with editing and proofreading.

As an example of a large project with a final report, we commissioned an edited summary report / compilation of our MTAIR sequence. This was at least a thousand hours of total work on the project, probably closer to 3,000, including all the work on the project and writing. The marginal work over the project and posts was a couple thousand dollars in editing, probably amounting to a few dozen hours of work. (We did not move it to latex, and the diagrams were screenshots rather than being done nicely in Latex.)

As an example of a conceptual paper that we put on .CY, here is a model of why people are working on agent foundations which Issa initially posted on the alignment forum. I pushed for rewiting and posting it on ArXiv. I guesstimate no more than 50 hours of work by Issa for the original post, and perhaps another 100 hours total writing and editing for ArXiv. It gets less attention than more technical work, but was also less work. I think that's fine, and it's valuable as a more authoritative reference for the arguments than existed previously.

There's also a poorly researched post on "dynamic safety envelopes" which I put together for other reasons, was never on the forum, and which I didn't realized was already superseded by Paul Christiano's and other's work on various topics. In retrospect, this should not have been put on ArXiv.

Seconding the .tex export, since it's much more useful than just getting a pdf!

That's correct. My point is that measuring goals which are not natural to measure will, in general, have many more problems with Goodharting and similar misoptimization and overoptimization pressures. And other approaches can be more productive, or at least more care is needed with design of metrics rather than discovery of what to measure and how.

I think this is going to be wrong as an approach. Weight and temperature are properties of physical systems at specific points in time, and can be measured coherently because we understand laws about those systems. Alignment could be measured as a function of a particular system at a specific point in time, once we have a clear understanding of what? All of human values? 

Depends on how you define the measure over jobs. If you mean "the jobs of half of all people," probably true. If you mean "half of the distinct jobs as they are classified by NAICS or similar," I think I disagree. 

Question: "effective arguments for the importance of AI safety" - is this about arguments for the importance of just technical AI safety, or more general AI safety, to include governance and similar things?

Think of it as a "practicing a dark art of rationality" post, and I'd think it would seem less off-putting.

Please feel free to repost this  elsewhere, and/or tell people about it. 

And if there is anyone interested in this type of job, but is currently still in school or for other reasons is unable to work full time at present, we encourage them to apply and note the circumstances, as we may be able to find other ways to support their work, or at least collaborate and provide mentorship.

I'm not sure I agree with the compatibility of discontinuity and prosaic alignment, though you make a reasonable case, but I do think there is compatibility between slower governance approaches and discontinuity, if it is far enough away.

Load More