x

AI ALIGNMENT FORUM
AF

DeepMind Alignment Team on Threat Models and Plans — AI Alignment Forum

DeepMind Alignment Team on Threat Models and Plans

Nov 25, 2022 by Vika

A collection of posts presenting our understanding of and opinions on alignment threat models and plans.

53DeepMind is hiring for the Scalable Alignment and Alignment Teams

Rohin Shah, Geoffrey Irving

4y

15

111DeepMind alignment team opinions on AGI ruin arguments

3y

11

58Will Capabilities Generalise More?

3y

28

45Clarifying AI X-risk

zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar, Elliot Catt

3y

16

36Threat Model Literature Review

zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar, Elliot Catt

3y

3

37Refining the Sharp Left Turn threat model, part 1: claims and mechanisms

Vika, Vikrant Varma, Ramana Kumar, Mary Phuong

3y

3

19Refining the Sharp Left Turn threat model, part 2: applying alignment techniques

Vika, Vikrant Varma, Ramana Kumar, Rohin Shah

3y

5

47Categorizing failures as “outer” or “inner” misalignment is often confused

3y

17

28Definitions of “objective” should be Probable and Predictive

3y

22

28Power-seeking can be probable and predictive for trained agents

3y

20

23Paradigms of AI alignment: components and enablers

3y

0

47[Linkpost] Some high-level thoughts on the DeepMind alignment team's strategy

Vika, Rohin Shah

3y

11

48A short course on AGI safety from the GDM Alignment team

Vika, Rohin Shah

9mo

0

34Evaluating and monitoring for AI scheming

Vika, Scott Emmons, Erik Jenner, Mary Phuong, Lewis Ho, Rohin Shah

5mo

0