This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Subscribe
Discussion
(0)
Scalable Oversight
Subscribe
Discussion
(0)
This page is a stub.
Posts tagged
Scalable Oversight
Most Relevant
1
57
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks
9mo
7
1
43
Scalable oversight as a quantitative rather than qualitative problem
Buck Shlegeris
7mo
8
1
15
Inference-Only Debate Experiments Using Math Problems
Arjun Panickssery
,
Abhimanyu Pallavi Sudhir
,
JacksonKaunismaa
6mo
0
2
13
AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
DanielFilan
5mo
0
2
61
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud
,
Jacob G-W
,
Evžen Wybitul
,
Joseph Miller
,
Alex Turner
1mo
3
1
22
On scalable oversight with weak LLMs judging strong LLMs
Zachary Kenton
,
Noah Siegel
,
janos
,
Jonah Brown-Cohen
,
Samuel Albanie
,
David Lindner
,
Rohin Shah
6mo
18
0
17
Human-AI Complementarity: A Goal for Amplified Oversight
Rishub Jain
,
Sophie Bridgers
1mo
3