This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Ilya Nachevsky
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
2
Steganography via internal activations is already possible in small language models — a potential first step toward persistent hidden reasoning.
1mo
0
9
Sleep peacefully: no hidden reasoning detected in LLMs. Well, at least in small ones.
5mo
0
Comments