AI ALIGNMENT FORUMTags
AF

AI Oversight

Contributors

You are viewing revision 1.0.0, last edited by RobertM

A description from Doing oversight from the very start of training seems hard:

Here ‘oversight’ means there is something with access to the internals of the model which checks that the model isn’t misaligned even if the behavior on the training distribution looks fine. In An overview of 11 proposals for building safe advanced AI, all but two of the proposals basically look like this, as does AI safety via market making.
Examples of oversight techniques include:

...

(Read More)

Posts tagged AI Oversight

35Oversight Misses 100% of Thoughts The AI Does Not Think

johnswentworth

15Quick thoughts on "scalable oversight" / "super-human feedback" research

David Scott Krueger

58Measuring and Improving the Faithfulness of Model-Generated Reasoning

Ansh Radhakrishnan, tamera, karinanguyen, Sam Bowman, Ethan Perez

9mo

6Doing oversight from the very start of training seems hard

Peter Barnett