AI ALIGNMENT FORUM
AF

264
Paul Colognese
Ω80300
Message
Dialogue
Subscribe

Personal website

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
2Paul Colognese's Shortform
3y
0
30High-level interpretability: detecting an AI's objectives
2y
0
16Aligned AI via monitoring objectives in AutoGPT-like systems
2y
0
34Decision Transformer Interpretability
3y
6
18Auditing games for high-level interpretability
3y
0
15Deception?! I ain’t got time for that!
3y
0