AI ALIGNMENT FORUM
AF

148
DeepMind Alignment Team on Threat Models and Plans

DeepMind Alignment Team on Threat Models and Plans

Nov 25, 2022 by Vika

A collection of posts presenting our understanding of and opinions on alignment threat models and plans.

53DeepMind is hiring for the Scalable Alignment and Alignment Teams
Rohin Shah, Geoffrey Irving
3y
15
111DeepMind alignment team opinions on AGI ruin arguments
Vika
3y
11
58Will Capabilities Generalise More?
Ramana Kumar
3y
28
45Clarifying AI X-risk
zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar, Elliot Catt
3y
16
36Threat Model Literature Review
zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar, Elliot Catt
3y
3
37Refining the Sharp Left Turn threat model, part 1: claims and mechanisms
Vika, Vikrant Varma, Ramana Kumar, Mary Phuong
3y
3
19Refining the Sharp Left Turn threat model, part 2: applying alignment techniques
Vika, Vikrant Varma, Ramana Kumar, Rohin Shah
3y
5
47Categorizing failures as “outer” or “inner” misalignment is often confused
Rohin Shah
3y
17
28Definitions of “objective” should be Probable and Predictive
Rohin Shah
3y
22
28Power-seeking can be probable and predictive for trained agents
Vika, janos
3y
20
23Paradigms of AI alignment: components and enablers
Vika
3y
0
47[Linkpost] Some high-level thoughts on the DeepMind alignment team's strategy
Vika, Rohin Shah
3y
11
48A short course on AGI safety from the GDM Alignment team
Vika, Rohin Shah
8mo
0
34Evaluating and monitoring for AI scheming
Vika, Scott Emmons, Erik Jenner, Mary Phuong, Lewis Ho, Rohin Shah
3mo
0