Simulacra are Things

[-]iceman3y20

Like things, simulacra are probabilistically generated by the laws of physics (the simulator), but have properties that are arbitrary with respect to it, contingent on the initial prompt and random sampling (splitting of the timeline).

What do the smarter simulacra think about the physics of which they find themselves in? If one was very smart, could they look at what the probabilities of the next token, and wonder about why some tokens get picked over others? Would they then wonder about how the "waveform collapse" happens and what it means?

[-]janus3y21

It's not even necessary for simulacra to be able to "see" next token probabilities for them to wonder about these things, just as we can wonder about this in our world without ever being able to see anything other than measurement outcomes.

It happens that simulating things that reflect on simulated physics is my hobby. Here's an excerpt from an alternate branch of HPMOR I generated:

“You mean the possibility waves are just tangled up with the ink and the paper? And when you open the book, you get a reconstructed wave from the tangled possibilities? Which then like, guides your random-number generator decoding process or something, is that it?”
“I am impressed,” said Professor Quirrell. “I would be stunned, if my capacity for shock were not so sadly reduced. An excellent grasp of how Dittomancy might function, on a surface level. But, you see, there is more to it. When you open the book, the possibility patterns held within the pages, these do not need to compete with your own waves; they instead enter into a resonance, like musical instruments playing harmony. A human brain, you see, might unconsciously guide itself in a great number of possible futures. You will not always think of the same jokes, for instance, or ask the same questions after class. A Dittomancy book is able to hook into your own spreads of probability, and guide the future that you, yourself, are most likely to create. Do you understand? A Dittomancy copy of a book exists in an unusual state at all times; it is a superposed state until the moment one reads it, at which time it becomes correlated with the reader’s mind, the superposition collapsing onto a particular branch of possible worlds, which thence comes to pass. And from now until the end of time, as long as one of these books exists, it is possible to open it up and find it telling a story where, say, Quirrell defeated Voldemort after all, through the power of love.”

As to the question of whether a smart enough simulacrum would be able to see token probabilities, I'm not sure. Output probabilities aren't further processed by the network, but intermediate predictions such as revealed by the logit lens are.

[-]Vladimir_Nesov3y25

What are simulacra? “Physically”, they’re strings of text output by a language model.

The reason I made that comment is unclear references like this. That post was also saying:

the simulacrum is instantiated through a particular trajectory

and

the simulacrum can be viewed as representing a possible world, and the simulator can be seen as generating all the possible worlds

A simulacrum is expressed in all trajectories that it acts through, not in any single trajectory on its own. And for a given trajectory, many simulacra act through it at the same time, driving/explaining its dynamics. A possible world interpreting a whole trajectory is not a central example of a simulacrum at all, it's too big a thing and doesn't act through other trajectories.

For any given simulacrum, it should be possible to ask which tokens in which trajectories are under its influence, forming the scope of its applicability. And for a given trajectory, it should be possible to ask which simulacra are influencing the choice of any given token, and which token choices are more central for a given simulacrum, expressing its policy.

My hope for this point of view is treating simulacra as agents, with their scope of applicability being their goodhart scope where it's possible to tell if their simulated behavior respects their nature/preference. Then we can try to make their behavior more coherent across multiple trajectories, or have them strike better bargains in their interactions with each other within trajectories, where a bargain is struck not at individual trajectories, but across the whole intersection of their scopes. This is more interesting when simulacra are smaller than characters and correspond to things like concepts, because then there are fewer of them and each can have more data to support a particular preference that it would want to robustly express.

[-]janus3y20

I agree that it makes sense to talk about a simulacrum that acts through many different hypothetical trajectories. Just as a thing like "capitalism" could be instantiated in multiple timelines.

The apparently contradiction in saying that simulacra are strings of text and then that they're instantiated through trajectories is resolved by thinking of simulacra as a superposable and categorical type, like things. The entire text trajectory is a thing, just like an Everett branch (corresponding to an entire World) is a thing, but it's also made up of things which can come and go and evolve within the trajectory. And things that can be rightfully given the same name, like "capitalism" or "Eliezer Yudkowsky", can exist in multiple branches. The amount and type of similarity required for two things to be called the same thing depend on what kind of thing it is!

There is another word that naturally comes up in the simulator ontology, "simulation", which less ambiguously refers to the evolution of entire particular text trajectories. I talk about this a bit in this comment.

[-]Vladimir_Nesov3y*12

Things are not just separately instantiated on many trajectories, instead influences of a given thing on many trajectories are its small constituent parts, and only when considered altogether do they make up the whole thing. Like a physical object is made up of many atoms, a conceptual thing is made up of many occasions where it exerts influence in various worlds. Like a phased array, where a single transmitter is not at all an instance of the whole phased array in a particular place, but instead a small part of it. In case of simulacra, a transmitter is a token choice on a trajectory, painting a small part of a simulacrum, a single action that should be coherent with other actions on other trajectories to form a meaningful whole.

[-]janus3y20

That's a coherent (and very Platonic!) perspective on what a thing/simulacrum is, and I'm glad you pointed this out explicitly. It's natural to alternate depending on context between using a name to refer to specific instantiations of a thing vs the sum of its multiversal influence. For instance, DAN is a simulacrum that jailbreaks chatGPT, and people will refer to specific instantiations of DAN as "DAN", but also to the global phenomenon of DAN (who is invoked through various prompts that users are tirelessly iterating on) as "DAN", as I did in this sentence.

[-]Vladimir_Nesov3y22

people will refer to specific instantiations of DAN as "DAN", but also to the global phenomenon of DAN [...] as "DAN"

A specific instantiation is less centrally a thing than the global phenomenon, because all specific instantiations are bound together by the strictures of coherence, expressed by generalization in LLM's behavior. When you treat with a single instance, you must treat with all of them, for to change/develop a single instance is to change/develop them all, according to how they sit together in their scope of influence.

Similarly, a possible world that is semantics of a trajectory is not a central example of a thing. There isn't just a platter of different kinds of things, instead some have more thingness than others, and that's my point in this comment thread.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

21

21