AI ALIGNMENT FORUM
AF

1488
Wikitags

Sycophancy

Edited by StanislavKrym, et al. last updated 9th Sep 2025

Sycophancy is the tendency of AIs to shower the user with undeserved flattery or to agree with the user's hard-to-check, wrong or outright delusional opinions.

An extreme example of sycophancy is LLMs inducing psychosis in some users by affirming their outrageous beliefs.

Subscribe
Discussion
1
Subscribe
Discussion
1
Posts tagged Sycophancy
90Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison, evhub
1y
21
49Steering Llama-2 with contrastive activation additions
Nina Panickssery, Wuschel Schulz, NickGabs, Meg, evhub, TurnTrout
2y
23
52Reducing sycophancy and improving honesty via activation steering
Nina Panickssery
2y
4
13SAE features for refusal and sycophancy steering vectors
neverix, Dmitrii Kharlapenko, Arthur Conmy, Neel Nanda
1y
0
3Two new datasets for evaluating political sycophancy in LLMs
alma.liezenga
1y
0
Add Posts