AI ALIGNMENT FORUM
AF

Ollie J
000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
35[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
1y
0
18Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models
2y
0