AI ALIGNMENT FORUM
AF

Wikitags

Deception

Edited by plex, Yoav Ravid, Roman Leventov last updated 8th Feb 2023

Deception is the act of sharing information in a way which intentionally misleads others.

Related Pages: Deceptive Alignment, Honesty, Meta-Honesty, Self-Deception, Simulacrum Levels

Subscribe
1
Subscribe
1
Discussion0
Discussion0
Posts tagged Deception
27AI Deception: A Survey of Examples, Risks, and Potential Solutions
Simon Goldstein, Peter S. Park
2y
1
13Interpreting the Learning of Deceit
RogerDearnaley
2y
2
93Deep Deceptiveness
So8res
2y
16
32LCDT, A Myopic Decision Theory
adamShimi, evhub
4y
44
122Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research
evhub, Nicholas Schiefer, Carson Denison, Ethan Perez
2y
14
69Why is o1 so deceptive?
Q
abramdemski, Sahil
1y
Q
14
49How likely is deceptive alignment?
evhub
3y
19
8Difficulty classes for alignment properties
Jozdien
2y
0
17The Speed + Simplicity Prior is probably anti-deceptive
[anonymous]3y
11
-21Lying is Cowardice, not Strategy
Connor Leahy, Gabriel Alfour
2y
21
14Precursor checking for deceptive alignment
evhub
3y
0
46AI x-risk, approximately ordered by embarrassment
Alex Lawsen
2y
1
83Discussion: Challenges with Unsupervised LLM Knowledge Discovery
Seb Farquhar, Vikrant Varma, zac_kenton, gasteigerjo, Vlad Mikulik, Rohin Shah
2y
11
69Monitoring for deceptive alignment
evhub
3y
4
38Are minimal circuits deceptive?
evhub
6y
10
Load More (15/33)
Add Posts