David Manheim


Modeling Transformative AI Risk (MTAIR)

Wiki Contributions


I'll note that I think this is a mistake that lots of people working in AI safety have made, ignoring the benefits of academic credentials and prestige because of the obvious costs and annoyance.  It's not always better to work in academia, but it's also worth really appreciating the costs of not doing so in foregone opportunities and experience, as Vanessa highlighted. (Founder effects matter; Eliezer had good reasons not to pursue this path, but I think others followed that path instead of evaluating the question clearly for their own work.)

And in my experience, much of the good work coming out of AI Safety has been sidelined because it fails the academic prestige test, and so it fails to engage with academics who could contribute or who have done closely related work. Other work avoids or fails the publication process because the authors don't have the right kind of guidance and experience to get their papers in to the right conferences and journals, and not only is it therefore often worse for not getting feedback from peer review, but it doesn't engage others in the research area.

A story of how that happens:

In future (unsafe but) sophisticated systems, models will have access to external storage, and be given the ability to read/write. For example, AutoGPT-like systems will have shared scratchpads across instances of the model, to coordinate with themselves. It seems instrumentally convergent for such systems to store things related to their true / eventual goals in such places, for example, so that they can be referenced by local instances. And if we do some sort of oversight of that data storage, key facts that the model wants to preserve might be obfuscated.

Of course, this is only one story, and I don't expect it to be the only way such things could happen, but it seems to be a reasonable candidate as a failure mode for systems designed with normal (read: insufficient, but far more than otherwise expected,) levels of caution.

For a defense of people pursuing a mathematical approach of a type you think isn't valuable, see my recent post.
(That does not address the correct issue you raised about requisite variety, but some work on HRAD does do so explicitly - such as embedded agency.)

Just noting that given more recent developments than this post, we should be majorly updating on recent progress towards Andrew Critch's strategy. (Still not more likely than not to succeed, but we still need to assign some Bayes points to Critch, and take some away from Nate.)

I think it would be really good to come up with a framing of these intuitions that wouldn't be controversial.


That seems great, I'd be very happy for someone to write this up more clearly. My key point was about people's claims and confidence about safety, and yes, clearly that was communicated less well than I hoped.

That's true - and from what I can see, this emerges from the culture in academia. There, people are doing research, and the goal is to see if something can be done, or to see what happens if you try something new. That's fine for discovery, but it's insufficient for safety. And that's why certain types of research, ones that pose dangers to researchers or the public, have at least some degree of oversight which imposes safety requirements. ML does not, yet.

Thanks, reading closely I see how you said that, but it wasn't clear initially. (There's an illusion of disagreement, which I'll christen the "twitter fight fallacy," where unless the opposite is said clearly, people automatically assume replies are disagreements.) 

I probably put in an extra 20-60 hours, so the total is probably closer to 150 - which surprises me. I will add that a lot of the conversion time was dealing with writing more, LaTeX figures and citations, which were all, I think, substantive valuable additions. (Changing to a more scholarly style was not substantively valuable, nor was struggling with latex margins and TikZ for the diagrams, and both took some part of the time.)

Thanks, agreed. And as an aside, I don't think it's entirely coincidental that neither of the people who agree with you are in the Bay.

I think that the costs usually are worth it far more often than it occurs, from an outside view - which was David's point, and what I was trying to respond to. I think that it's more valuable than one expects to actually just jump through the hoops. And especially for people who haven't yet ever had any outputs actually published, they really should do that at least once.

(Also, sorry for the zombie reply.)

Load More