AI ALIGNMENT FORUM
AF

332
Wikitags

AI Auditing

Edited by Raemon last updated 4th Aug 2025

Formerly "auditing games"

Subscribe
Discussion
Subscribe
Discussion
Posts tagged AI Auditing
4
43Automating Auditing: An ambitious concrete technical research proposal
evhub
4y
11
3
76A transparency and interpretability tech tree
evhub
3y
10
1
82Auditing language models for hidden objectives
Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Kei Nishimura-Gasparian, 7vik, Akbir Khan, Austin Meek, Euan Ong, Christopher Olah, Fabien Roger, jeanne_, Meg, Drake Thomas, Adam Jermyn, Monte M, evhub
8mo
3
1
59Towards Alignment Auditing as a Numbers-Go-Up Science
Sam Marks
3mo
8
1
31Putting up Bumpers
Sam Bowman
7mo
7
2
22What progress have we made on automated auditing?
Q
LawrenceC
1y
Q
0
1
18Auditing games for high-level interpretability
Paul Colognese
3y
0
Add Posts