This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Sycophancy
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Sycophancy
Random Tag
Contributors
Posts tagged
Sycophancy
Most Relevant
1
90
Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison
,
Evan Hubinger
4mo
21
2
49
Steering Llama-2 with contrastive activation additions
Nina Panickssery
,
Wuschel Schulz
,
NickGabs
,
Meg
,
Evan Hubinger
,
Alex Turner
9mo
23
0
52
Reducing sycophancy and improving honesty via activation steering
Nina Panickssery
1y
4
0
3
Two new datasets for evaluating political sycophancy in LLMs
alma.liezenga
12d
0