AI ALIGNMENT FORUM
AF

Tristram Price
010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work
Tristram Price11mo00

This is really helpful to get such an overview. It's an impressive body of work!

Building on the Frontier Safety Team's recent work on persuasion, do you see an expansion of human-AI interaction experiments?

Based on Dafoe's work, how is the AGI Safety Team currently thinking about structural risks and corresponding threat models? For instance, cyber and CBRN threats are recognized as misuse cases. Are there evals/red teaming planned for capabilities that could significantly impact, say, nuclear deterrence, especially in scenarios with asymmetric access to such capabilities? Or is this seen as too contextual and covered elsewhere? 

Relatedly on the mitigations aspect of the FSF, is the team confident about having sufficient buffer time to implement the security mitigations? For example, effective supply chain security and vetting infrastructure have long lead times.

Reply
No posts to display.