Deception

Edited by plex, Yoav Ravid, Roman Leventov last updated 8th Feb 2023

Deception is the act of sharing information in a way which intentionally misleads others.

Related Pages: Deceptive Alignment, Honesty, Meta-Honesty, Self-Deception, Simulacrum Levels

Posts tagged Deception

3

27AI Deception: A Survey of Examples, Risks, and Potential Solutions

Simon Goldstein, Peter S. Park

3y

1

2

14Interpreting the Learning of Deceit

RogerDearnaley

2y

2

0

93Deep Deceptiveness

So8res

3y

16

2

32LCDT, A Myopic Decision Theory

adamShimi, evhub

5y

44

1

123Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research

evhub, Nicholas Schiefer, Carson Denison, Ethan Perez

3y

14

2

69Why is o1 so deceptive?

Q

abramdemski, Sahil

2y

Q

14

1

50How likely is deceptive alignment?

evhub

4y

20

1

8Difficulty classes for alignment properties

Jozdien

2y

0

1

17The Speed + Simplicity Prior is probably anti-deceptive

[anonymous]4y

11

1

14Precursor checking for deceptive alignment

evhub

4y

0

1

-21Lying is Cowardice, not Strategy

Connor Leahy, Gabriel Alfour

3y

21

1

46AI x-risk, approximately ordered by embarrassment

Alex Lawsen

3y

1

83Discussion: Challenges with Unsupervised LLM Knowledge Discovery

Seb Farquhar, Vikrant Varma, zac_kenton, gasteigerjo, Vlad Mikulik, Rohin Shah

2y

11

1

67Monitoring for deceptive alignment

evhub

4y

4

0

38Are minimal circuits deceptive?

evhub

7y

10