AI ALIGNMENT FORUM
AF

455
Bartosz Cywiński
Ω9200
Message
Dialogue
Subscribe

MATS 8.0 scholar with Arthur Conmy and Sam Marks

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
No Comments Found
9Current LLMs seem to rarely detect CoT tampering
8h
0
27Eliciting secret knowledge from language models
2mo
0