AI ALIGNMENT FORUMTags
AF

Treacherous Turn

EditHistorySubscribe
Discussion (0)
Help improve this page (1 flag)
EditHistorySubscribe
Discussion (0)
Help improve this page (1 flag)
Treacherous Turn
Random Tag
Contributors
2plex
2Ruben Bloom
1Noosphere89

A Treacherous Turn is a hypothetical event where an advanced AI system which has been pretending to be aligned due to its relative weakness turns on humanity once it achieves sufficient power that it can pursue its true objective without risk.

Posts tagged Treacherous Turn
Most Relevant
0
37Soares, Tallinn, and Yudkowsky discuss AGI cognition
Nate Soares, Eliezer Yudkowsky, jaan
7mo
19
0
16A Gym Gridworld Environment for the Treacherous Turn
Michaël Trazzi
4y
0
1
38A very crude deception eval is already passed
Beth Barnes
8mo
4
1
18AI learns betrayal and how to avoid it
Stuart Armstrong
9mo
4
1
12[AN #165]: When large models are more likely to lie
Rohin Shah
9mo
0
0
16[Linkpost] Treacherous turns in the wild
Mark Xu
1y
2
Add Posts