[Metadata: crossposted from https://tsvibt.blogspot.com/2022/10/the-conceptual-doppleganger-problem.html. First completed 9 October 2022.]
Suppose we want to observe the thoughts of a mind in order to detect whether it's making its way towards a plan to harm us, and ideally also to direct the mind so that it pursues specific aims. To this end, we might hope that the mind and its thinking are organized in a way we can come to understand in the way that we understand ourselves and our thinking.
We might hope that when the mind considers plans that involve something, e.g. plans that involve the coffee cup, it does so using a concept alike to our concept [[coffee cup]]. When the mind recognizes, predicts, imagines, simulates, manipulates, designs, combines things with, describes, studies, associates things with, summarizes, remembers, compares things with, deduces things about, makes hypotheses about, or is otherwise mentally involved with the coffee cup, maybe it always does so in a way that is fully comprehendable in fixed terms that are similar to the terms in which we understand ourselves when we do those activities. Maybe the structure involved in psychic events in the mind reliably falls into basins of attraction that indicate unambiguously to us, as we observe these events, which nexi of reference that structure constitutes. Maybe the X-and-only-X problem is solved by ensuring that the mind's thoughts are in a language made of these concepts; when the mind plans to "fetch the coffee", it somehow means only fetching the coffee, in the "natural" sense of [[fetch]] and [[the coffee]].
One obstacle to this rosy picture is conceptual Doppelgängers. A conceptual Doppelgänger of some concept Z, is a concept Z' that serves some overlapping functions in the mind as Z serves, but is psychically distinct from Z. Here saying that Z' is psychically distinct from Z is ambiguous, but means something like: Z' is not transparently closely related to Z, or is mechanistically / physically separate from Z, or is referred to in a set of contexts that's systematically segregrated from the contexts in which Z is referred to, or is not explicitly described or treated as being the same as or similar to or analogous to Z. A Doppelgänger concept Z' enables a mind to think about what Z is about, at least in some respects, without psychically using Z. This makes it hard to be sure that the mind is not thinking about what Z is about; even if the mind is not using Z, it might be thinking about what Z is about by using some Z'.
Maybe Doppelgängers of Z can be psychically located by doing something like looking for mental stuff that has high mutual logical information with Z. This might work to identify blatant deception: if the mind maintains a puppet show of fake thoughts using Z and has its real thoughts using a Z' that's psychically isomorphic to Z, then Z' will be obviously related to Z. But, Doppelgängers don't have to be so obvious. Mental stuff that constitutes skill with manipulating what Z is about, can be, compared to Z, more or less:
and can be
All of these features make it harder to see that Z' is in some respects a Doppelgänger of Z. In other words, to the extent these features (and probably others) characterize mental stuff in the mind, the mind is liable to be thinking about coffee cups even when not using [[coffee cup]], e.g. by computing trajectories for moving around objects with such-and-such density and such-and-such shape while carefully avoiding connecting those trajectories with the idea of coffee.
Fun example: The evolution of offensive words seems relevant here. IE, we frown upon using currently-offensive words, so we end up expressing ourselves using some other words. And over time, we realise that those other words are (primarily used as) Doppelgangers, and mark them as offensive as well.
Sorta related (maybe?): I have a (speculative) theory that people have a kind of machinery in their brains for processing the emotions of other people, and that people with autism find it aversive to use that machinery, and so people with autism learn early in life particular habits of thought that reliably avoid activating that machinery at all. But then they learn to analyze and react to the emotions of other people via the general-purpose human ability to learn things. More details here.
Yeah, that could produce an example of Doppelgängers. E.g. if an autist (in your theory) later starts using that machinery more heavily. Then there's the models coming from the general-purpose analysis, and the models coming from the intuitive machinery, and they're about the same thing.