AI ALIGNMENT FORUM
AF

Wikitags

Threat Models (AI)

Edited by Quinn, Jacob Pfau, et al. last updated 12th Apr 2023

A threat model is a story of how a particular risk (e.g. AI) plays out.

In the AI risk case, according to Rohin Shah, a threat model is ideally:

Combination of a development model that says how we get AGI and a risk model that says how AGI leads to existential catastrophe.

See also AI Risk Concrete Stories

Subscribe
1
Subscribe
1
Discussion0
Discussion0
Posts tagged Threat Models (AI)
147AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
3y
144
106What failure looks like
paulfchristiano
6y
28
118Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
3y
34
93What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
Andrew_Critch
4y
49
93Deep Deceptiveness
So8res
2y
16
98On how various plans miss the hard bits of the alignment challenge
So8res
3y
48
74Another (outer) alignment failure story
paulfchristiano
4y
25
74A central AI alignment problem: capabilities generalization, and the sharp left turn
So8res
3y
18
63Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Jeremy Gillen, peterbarnett
2y
0
46AI x-risk, approximately ordered by embarrassment
Alex Lawsen
2y
1
44Ten Levels of AI Alignment Difficulty
Sammy Martin
2y
3
35The Main Sources of AI Risk?
Daniel Kokotajlo, Wei Dai
6y
11
45Clarifying AI X-risk
zac_kenton, Rohin Shah, David Lindner, Vikrant Varma, Vika, Mary Phuong, Ramana Kumar, Elliot Catt
3y
16
39Less Realistic Tales of Doom
Mark Xu
4y
0
47[Linkpost] Some high-level thoughts on the DeepMind alignment team's strategy
Vika, Rohin Shah
3y
11
Load More (15/44)
Add Posts