AI ALIGNMENT FORUM
AF

372
Wikitags

Treacherous Turn

Edited by plex, et al. last updated 30th Dec 2024

Treacherous Turn is a hypothetical event where an advanced AI system which has been pretending to be aligned due to its relative weakness turns on humanity once it achieves sufficient power that it can pursue its true objective without risk.

Subscribe
Discussion
1
Subscribe
Discussion
1
Posts tagged Treacherous Turn
16A Gym Gridworld Environment for the Treacherous Turn
Michaël Trazzi
7y
0
37Soares, Tallinn, and Yudkowsky discuss AGI cognition
So8res, Eliezer Yudkowsky, jaan
4y
24
41A very crude deception eval is already passed
Beth Barnes
4y
4
18AI learns betrayal and how to avoid it
Stuart_Armstrong
4y
4
12[AN #165]: When large models are more likely to lie
Rohin Shah
4y
0
16[Linkpost] Treacherous turns in the wild
Mark Xu
4y
2
Add Posts