This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
482
Wikitags
Scalable Oversight
This page is a stub.
Subscribe
Discussion
Subscribe
Discussion
Posts tagged
Scalable Oversight
Most Relevant
58
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks
2y
7
43
Learning the prior
paulfchristiano
5y
23
43
Scalable oversight as a quantitative rather than qualitative problem
Buck
1y
8
15
Inference-Only Debate Experiments Using Math Problems
Arjun Panickssery
,
Abhimanyu Pallavi Sudhir
,
JacksonKaunismaa
1y
0
13
AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
DanielFilan
1y
0
77
[Research Note] Optimizing The Final Output Can Obfuscate CoT
lukemarks
,
jacob_drori
,
cloud
,
TurnTrout
3mo
3
64
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud
,
Jacob G-W
,
Evzen
,
Joseph Miller
,
TurnTrout
10mo
4
42
Prover-Estimator Debate: A New Scalable Oversight Protocol
Jonah Brown-Cohen
,
Geoffrey Irving
4mo
17
22
On scalable oversight with weak LLMs judging strong LLMs
zac_kenton
,
Noah Siegel
,
janos
,
Jonah Brown-Cohen
,
Samuel Albanie
,
David Lindner
,
Rohin Shah
1y
18
17
Human-AI Complementarity: A Goal for Amplified Oversight
rishubjain
,
Sophie Bridgers
10mo
3
14
Is weak-to-strong generalization an alignment technique?
Q
cloud
9mo
Q
1