AI ALIGNMENT FORUM
AF

Ollie J
000
Message
Dialogue
Subscribe

Posts

Sorted by New
35[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
11mo
0
18Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models
2y
0

Wikitag Contributions

No wikitag contributions to display.

Comments

Sorted by
Newest
No Comments Found