x

AI ALIGNMENT FORUM

AF

Treacherous Turn — AI Alignment Forum

Treacherous Turn

Edited by plex, et al. last updated 30th Dec 2024

Treacherous Turn is a hypothetical event where an advanced AI system which has been pretending to be aligned due to its relative weakness turns on humanity once it achieves sufficient power that it can pursue its true objective without risk.

Add Posts

1

1

Posts tagged Treacherous Turn

0

16A Gym Gridworld Environment for the Treacherous Turn

Michaël Trazzi

8y

0

0

37Soares, Tallinn, and Yudkowsky discuss AGI cognition

So8res, Eliezer Yudkowsky, jaan

4y

24

1

41A very crude deception eval is already passed

5y

4

1

18AI learns betrayal and how to avoid it

Stuart_Armstrong

5y

4

1

12[AN #165]: When large models are more likely to lie

5y

0

0

16[Linkpost] Treacherous turns in the wild

5y

2

Add Posts