AI ALIGNMENT FORUM
AF

205
Rowan Wang
Ω118000
Message
Dialogue
Subscribe

https://rowankwang.com/

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
29Building and evaluating alignment auditing agents
3mo
0
45Modifying LLM Beliefs with Synthetic Document Finetuning
6mo
10
48Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
3y
4
25Gears-Level Mental Models of Transformer Interpretability
4y
1