x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Teun van der Weij
Research scientist at Apollo Research.
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
Teun van der Weij — AI Alignment Forum
2
Teun van der Weij's Shortform
11mo
0
57
Stress Testing Deliberative Alignment for Anti-Scheming Training
5mo
10
12
How to mitigate sandbagging
10mo
0
4
The Elicitation Game: Evaluating capability elicitation techniques
1y
1
35
[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
2y
0
19
An Introduction to AI Sandbagging
2y
0
21
Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B?
2y
0
Comments