This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Teun van der Weij
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
2
Teun van der Weij's Shortform
6mo
0
12
How to mitigate sandbagging
6mo
0
4
The Elicitation Game: Evaluating capability elicitation techniques
7mo
1
35
[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
1y
0
19
An Introduction to AI Sandbagging
1y
0
21
Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B?
2y
0
Comments