AI ALIGNMENT FORUM
AF

Teun van der Weij
Ω91400
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
2Teun van der Weij's Shortform
6mo
0
No wikitag contributions to display.
No Comments Found
12How to mitigate sandbagging
6mo
0
4The Elicitation Game: Evaluating capability elicitation techniques
7mo
1
35[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
1y
0
19An Introduction to AI Sandbagging
1y
0
21Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B?
2y
0