This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
455
Bartosz Cywiński
MATS 8.0 scholar with Arthur Conmy and Sam Marks
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
9
Current LLMs seem to rarely detect CoT tampering
8h
0
27
Eliciting secret knowledge from language models
2mo
0
Comments