AI ALIGNMENT FORUM
Wikitags
AF

Subscribe
Discussion0
1

Deception

Subscribe
Discussion0
1
Written by plex, Yoav Ravid, Roman Leventov last updated 8th Feb 2023

Deception is the act of sharing information in a way which intentionally misleads others.

Related Pages: Deceptive Alignment, Honesty, Meta-Honesty, Self-Deception, Simulacrum Levels

Posts tagged Deception
27AI Deception: A Survey of Examples, Risks, and Potential Solutions
Simon Goldstein, Peter S. Park
2y
1
13Interpreting the Learning of Deceit
Roger Dearnaley
1y
2
89Deep Deceptiveness
Nate Soares
2y
16
32LCDT, A Myopic Decision Theory
Adam Shimi, Evan Hubinger
4y
44
122Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research
Evan Hubinger, Nicholas Schiefer, Carson Denison, Ethan Perez
2y
14
69Why is o1 so deceptive?
Q
Abram Demski, Sahil
9mo
Q
14
49How likely is deceptive alignment?
Evan Hubinger
3y
19
8Difficulty classes for alignment properties
Arun Jose
1y
0
17The Speed + Simplicity Prior is probably anti-deceptive
[anonymous]3y
11
-21Lying is Cowardice, not Strategy
Connor Leahy, Gabriel Alfour
2y
21
14Precursor checking for deceptive alignment
Evan Hubinger
3y
0
46AI x-risk, approximately ordered by embarrassment
Alex Lawsen
2y
1
81Discussion: Challenges with Unsupervised LLM Knowledge Discovery
Seb Farquhar, Vikrant Varma, Zachary Kenton, Johannes Gasteiger, Vladimir Mikulik, Rohin Shah
1y
11
69Monitoring for deceptive alignment
Evan Hubinger
3y
4
38Are minimal circuits deceptive?
Evan Hubinger
6y
10
Load More (15/33)
Add Posts