AI ALIGNMENT FORUMTags
AF

Treacherous Turn

EditHistorySubscribe
Discussion (0)
Help improve this page (1 flag)
EditHistorySubscribe
Discussion (0)
Help improve this page (1 flag)
Treacherous Turn
Random Tag
Contributors
2plex
2Ruben Bloom
1Noosphere89

A Treacherous Turn is a hypothetical event where an advanced AI system which has been pretending to be aligned due to its relative weakness turns on humanity once it achieves sufficient power that it can pursue its true objective without risk.

Posts tagged Treacherous Turn
0
16A Gym Gridworld Environment for the Treacherous Turn
Michaël Trazzi
5y
0
0
37Soares, Tallinn, and Yudkowsky discuss AGI cognition
Nate Soares, Eliezer Yudkowsky, jaan
2y
24
1
41A very crude deception eval is already passed
Beth Barnes
2y
4
1
18AI learns betrayal and how to avoid it
Stuart Armstrong
2y
4
1
12[AN #165]: When large models are more likely to lie
Rohin Shah
2y
0
0
16[Linkpost] Treacherous turns in the wild
Mark Xu
3y
2
Add Posts