Steganography via internal activations is already possible in small language models — a potential first step toward persistent hidden reasoning.
by Ilia Shirokov and Ilya Nachevsky
Introduction In this post we continue to work on the project, started in our previous post, of exploring hidden reasoning of large language models (LLM). We ask whether an LLM, when given both a “general” and a “secret” question, can generate its public answer to the general question while simultaneously...
Aug 9, 20257