AI ALIGNMENT FORUMDeepMind Alignment Team on Threat Models and Plans
AF

DeepMind Alignment Team on Threat Models and Plans

Nov 25, 2022 by Victoria Krakovna

A collection of posts presenting our understanding of and opinions on alignment threat models and plans.

51DeepMind is hiring for the Scalable Alignment and Alignment Teams
Rohin Shah, Geoffrey Irving
1y
7
106DeepMind alignment team opinions on AGI ruin arguments
Victoria Krakovna
1y
5
49Will Capabilities Generalise More?
Ramana Kumar
1y
28
43Clarifying AI X-risk
Zachary Kenton, Rohin Shah, David Lindner, Vikrant Varma, Victoria Krakovna, Mary Phuong, Ramana Kumar, Elliot Catt
1y
14
35Threat Model Literature Review
Zachary Kenton, Rohin Shah, David Lindner, Vikrant Varma, Victoria Krakovna, Mary Phuong, Ramana Kumar, Elliot Catt
1y
3
36Refining the Sharp Left Turn threat model, part 1: claims and mechanisms
Victoria Krakovna, Vikrant Varma, Ramana Kumar, Mary Phuong
1y
1
20Refining the Sharp Left Turn threat model, part 2: applying alignment techniques
Victoria Krakovna, Vikrant Varma, Ramana Kumar, Rohin Shah
10mo
5
43Categorizing failures as “outer” or “inner” misalignment is often confused
Rohin Shah
9mo
17
28Definitions of “objective” should be Probable and Predictive
Rohin Shah
9mo
20
27Power-seeking can be probable and predictive for trained agents
Victoria Krakovna, janos
7mo
20
23Paradigms of AI alignment: components and enablers
Victoria Krakovna
1y
0
47[Linkpost] Some high-level thoughts on the DeepMind alignment team's strategy
Victoria Krakovna, Rohin Shah
7mo
11