Tristram Price — AI Alignment Forum

This is really helpful to get such an overview. It's an impressive body of work!

Building on the Frontier Safety Team's recent work on persuasion, do you see an expansion of human-AI interaction experiments?

Based on Dafoe's work, how is the AGI Safety Team currently thinking about structural risks and corresponding threat models? For instance, cyber and CBRN threats are recognized as misuse cases. Are there evals/red teaming planned for capabilities that could significantly impact, say, nuclear deterrence, especially in scenarios with asymmetric access to such capabilities? Or is this seen as too contextual and covered elsewhere?

Relatedly on the mitigations aspect of the FSF, is the team confident about having sufficient buffer time to implement the security mitigations? For example, effective supply chain security and vetting infrastructure have long lead times.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments