This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Deception
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Deception
Random Tag
Contributors
2
Yoav Ravid
You are viewing revision 1.1.0, last edited by
Yoav Ravid
Related Pages:
Honesty
,
Meta-Honesty
,
Self-Deception
,
Simulacrum Levels
Posts tagged
Deception
Most Relevant
3
21
AI Deception: A Survey of Examples, Risks, and Potential Solutions
Simon Goldstein
,
Peter S. Park
8mo
1
2
13
Interpreting the Learning of Deceit
Roger Dearnaley
4mo
2
0
84
Deep Deceptiveness
Nate Soares
1y
16
2
32
LCDT, A Myopic Decision Theory
Adam Shimi
,
Evan Hubinger
3y
44
1
119
Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research
Evan Hubinger
,
Nicholas Schiefer
,
Carson Denison
,
Ethan Perez
8mo
13
1
49
How likely is deceptive alignment?
Evan Hubinger
2y
19
1
-16
Lying is Cowardice, not Strategy
Connor Leahy
,
Gabriel Alfour
6mo
21
1
6
Difficulty classes for alignment properties
Arun Jose
2mo
0
1
17
The Speed + Simplicity Prior is probably anti-deceptive
[anonymous]
2y
11
1
14
Precursor checking for deceptive alignment
Evan Hubinger
2y
0
1
80
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
Seb Farquhar
,
Vikrant Varma
,
Zachary Kenton
,
Johannes Gasteiger
,
Vladimir Mikulik
,
Rohin Shah
4mo
11
1
42
AI x-risk, approximately ordered by embarrassment
Alex Lawsen
1y
1
1
69
Monitoring for deceptive alignment
Evan Hubinger
2y
4
0
37
Are minimal circuits deceptive?
Evan Hubinger
5y
10
1
29
Thoughts On (Solving) Deep Deception
Arun Jose
6mo
0