This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Deception
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Random Tag
Contributors
2
Yoav Ravid
You are viewing revision 1.1.0, last edited by
Yoav Ravid
Related Pages:
Honesty
,
Meta-Honesty
,
Self-Deception
,
Simulacrum Levels
Posts tagged
Deception
Most Relevant
3
21
AI Deception: A Survey of Examples, Risks, and Potential Solutions
Simon Goldstein
,
Peter S. Park
7mo
1
2
13
Interpreting the Learning of Deceit
Roger Dearnaley
3mo
2
0
84
Deep Deceptiveness
Nate Soares
1y
16
2
32
LCDT, A Myopic Decision Theory
Adam Shimi
,
Evan Hubinger
3y
44
1
119
Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research
Evan Hubinger
,
Nicholas Schiefer
,
Carson Denison
,
Ethan Perez
7mo
13
1
49
How likely is deceptive alignment?
Evan Hubinger
2y
19
1
-16
Lying is Cowardice, not Strategy
Connor Leahy
,
Gabriel Alfour
5mo
21
1
6
Difficulty classes for alignment properties
Arun Jose
1mo
0
1
17
The Speed + Simplicity Prior is probably anti-deceptive
[anonymous]
2y
11
1
14
Precursor checking for deceptive alignment
Evan Hubinger
2y
0
1
80
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
Seb Farquhar
,
Vikrant Varma
,
Zachary Kenton
,
Johannes Gasteiger
,
Vladimir Mikulik
,
Rohin Shah
3mo
11
1
42
AI x-risk, approximately ordered by embarrassment
Alex Lawsen
1y
1
1
69
Monitoring for deceptive alignment
Evan Hubinger
2y
4
0
37
Are minimal circuits deceptive?
Evan Hubinger
5y
10
1
29
Thoughts On (Solving) Deep Deception
Arun Jose
5mo
0