AI ALIGNMENT FORUM
AF

DeepMind Alignment Team on Threat Models and Plans

Nov 25, 2022 by Victoria Krakovna

A collection of posts presenting our understanding of and opinions on alignment threat models and plans.

53DeepMind is hiring for the Scalable Alignment and Alignment Teams
Rohin Shah, Geoffrey Irving
3y
15
111DeepMind alignment team opinions on AGI ruin arguments
Victoria Krakovna
3y
11
58Will Capabilities Generalise More?
Ramana Kumar
3y
28
45Clarifying AI X-risk
Zachary Kenton, Rohin Shah, David Lindner, Vikrant Varma, Victoria Krakovna, Mary Phuong, Ramana Kumar, Elliot Catt
3y
16
36Threat Model Literature Review
Zachary Kenton, Rohin Shah, David Lindner, Vikrant Varma, Victoria Krakovna, Mary Phuong, Ramana Kumar, Elliot Catt
3y
3
37Refining the Sharp Left Turn threat model, part 1: claims and mechanisms
Victoria Krakovna, Vikrant Varma, Ramana Kumar, Mary Phuong
3y
3
19Refining the Sharp Left Turn threat model, part 2: applying alignment techniques
Victoria Krakovna, Vikrant Varma, Ramana Kumar, Rohin Shah
3y
5
47Categorizing failures as “outer” or “inner” misalignment is often confused
Rohin Shah
3y
17
28Definitions of “objective” should be Probable and Predictive
Rohin Shah
3y
22
28Power-seeking can be probable and predictive for trained agents
Victoria Krakovna, janos
2y
20
23Paradigms of AI alignment: components and enablers
Victoria Krakovna
3y
0
47[Linkpost] Some high-level thoughts on the DeepMind alignment team's strategy
Victoria Krakovna, Rohin Shah
2y
11
48A short course on AGI safety from the GDM Alignment team
Victoria Krakovna, Rohin Shah
5mo
0
31Evaluating and monitoring for AI scheming
Victoria Krakovna, Scott Emmons, Erik Jenner, Mary Phuong, Lewis Ho, Rohin Shah
1d
0