[Simulators seminar sequence] #2 Semiotic physics - revamped

11Rohin Shah

3Jan Hendrik Kirchner

7Adrià Garriga-Alonso

6Peter Schmidt-Nielsen

8Lawrence Chan

1Jan Hendrik Kirchner

1Jan Hendrik Kirchner

6Ryan Greenblatt

6Adrià Garriga-Alonso

1Jan Hendrik Kirchner

3leogao

4Vladimir Nesov

6janus

1Vojtech Kovarik

New Comment

14 comments, sorted by Click to highlight new comments since: Today at 10:59 AM

Proof sketch:Left to the reader as an exercise.

You might want to formally state the thing you want proved in Proposition 2; right now I can't even tell what you are trying to claim. Some issues with the current formalization:

- doesn't appear as an unbound variable in the left hand side of your equation (because you take the limit as it goes to infinity), but it does appear on the right hand side of the equation, which seems pretty wild.
- I don't know what the symbol is supposed to mean; the text suggests it means "proportional" but I don't think you mean that I can replace the symbol with where is some constant of proportionality.
- It seems very sketchy that in the LHS is treated as evidence (to the right of the conditioning bar) while in the RHS it is not -- what if is very low probability?

My best guess is that you want to relate the quantities and , but I don't see why there would be any straightforward relation between these quantities (apart from the obvious one where the max sequence is one way to get the token and so is a lower bound on its probability, i.e. ).

EDIT: Maybe you want to say that is "not much higher than" ? If so, that seems false for LLMs; imagine the case where .

Hi, thanks for the response! I apologize, the "Left as an exercise" line was mine, and written kind of tongue-in-cheek. The rough sketch of the proposition we had in the initial draft did not spell out sufficiently clearly what it was I want to demonstrate here and was also (as you point out correctly) wrong in the way it was stated. That wasted people's time and I feel pretty bad about it. Mea culpa.

I think/hope the current version of the statement is more complete and less wrong. (Although I also wouldn't be shocked if there are mistakes in there). Regarding your points:

- The limit now shows up on both sides of the equation (as it should)! The dependence on on the RHS does actually kind of drop away at some point, but I'm not showing that here. I'd previously just sloppily substituted "chose as a large number" and then rewrite the proposition in the way indicated at the end of the Note for Proposition 2. That's the way these large deviation principles are typically used.
- Yeah, that should have been an rather than a . Sorry, sloppy.
- True. Thinking more about it now, perhaps framing the proposition in terms of "bridges" was a confusing choice; if I revisit this post again (in a month or so 🤦♂️) I will work on cleaning that up.

So, a softmax can never emit a probability of 0 or 1, maybe they were implicitly assuming the model ends in a softmax (as is the common case)? Regardless, the proof is still wrong if a model is allowed unbounded context, as an infinite product of positive numbers less than 1 can still be nonzero. For example, if the probability of emitting another " 0" is even just as high as $1 - \frac1{n^{1.001}}$ after already having emitted $n$ copies of " 0", then the limiting probability is still nonzero.

But if the model has a finite context and ends in a softmax then I think there is some minimum probability of transitioning to a given token, and then the proposition is true. Maybe that was implicitly assumed?

Technically correct, thanks for pointing that out! This comment (and the ones like it) was the motivation for introducing the "non-degenerate" requirement into the text. In practice, the proposition holds pretty well - although I agree it would nice to have a deeper understanding of when to expect the transition rule to be "non-degenerate"

Calling individual tokens the 'State' and a generated sequence the 'Trajectory' is wrong/misleading IMO.

I would instead call a sequence as a whole the 'State'. This follows the meaning from Dynamical systems.

Then, you could refer to a Trajectory which is a list of sequence each with one more token.

(That said, I'm not sure thinking about trajectories is useful in this context for various reasons)

Hmm there was a bunch of back and forth on this point even before the first version of the post, with @Michael Oesterle and @metasemi arguing what you are arguing. My motivation for calling the token the state is that A) the math gets easier/cleaner that way and B) it matches my geometric intuitions. In particular, if I have a first-order dynamical system then is the state, not the trajectory of states . In this situation, the dynamics of the system only depend on the current state (that's because it's a first-order system). When we move to higher-order systems, , then the state is *still* just , but the dynamics of the system but also the "direction from which we entered it". That's the first derivative (in a time-continuous system) or the previous state (in a time-discrete system).

At least I think that's what's going on. If someone makes a compelling argument that defuses my argument then I'm happy to concede!

My impression is that simulacra should be semantic objects that interact with interpretations of (sampled) texts, notably characters (agents), possibly objects and concepts. They are only weakly associated with particular texts/trajectories, the same simulacrum can be relevant to many different trajectories. Only many relevant trajectories, considered altogether, paint an adequate picture of a given simulacrum.

(This serves as a vehicle for discussing possible inductive biases that should move LLMs from token prediction and towards (hypothetical) world prediction.)

I agree. Here's the text of a short doc I wrote at some point titled 'Simulacra are Things'

What are simulacra?

“Physically”, they’re strings of text output by a language model. But when we talk about simulacra, we often mean a particular character, e.g. simulated Yudkowsky. Yudkowsky manifests through the vehicle of text outputted by GPT, but we might say that the Yudkowsky simulacrum terminates if the scene changes and he’s not in the next scene, even though the text continues. So simulacra are also used to carve the output text into salient objects.

Essentially, simulacra are to a simulator as “things” are to physics in the real world. “Things” are a superposable type – the entire universe is a thing, a person is a thing, a component of a person is a thing, and two people are a thing. And likewise, “simulacra” are superposable in the simulator, Things are made of things. Technically, a random collection of atoms sampled randomly from the universe is a thing, but there’s usually no reason to pay attention to such a collection over any other. Some things (like a person) are meaningful partitions of the world (e.g. in the sense of having explanatory/predictive power as an object in an ontology). We assign names to meaningful partitions (individuals and categories).

Like things, simulacra are probabilistically generated by the laws of physics (the simulator), but have properties that are arbitrary with respect to it, contingent on the initial prompt and random sampling (splitting of the timeline). They are not necessary but contingent truths; they are particular realizations of the potential of the simulator, a branch of the implicit multiverse. In a GPT simulation and in reality, the fact that there are three (and not four or two) people in a room at time is not necessitated by the laws of physics, but contingent on the probabilistic evolution of the previous state that is contingent on (…) an initial seed(prompt) generated by an unknown source that may itself have arbitrary properties.

We experience all action (intelligence, agency, etc) contained in the potential of the simulator through particular simulacra, just like we never experience the laws of physics directly, only through things generated by the laws of physics. We are liable to accidentally ascribe properties of contingent things to the underlying laws of the universe, leading us to conclude that

light is made of particles that deflect like macroscopic objects, or thatrivers and celestial bodies are agentslike people.Just as it is wrong to conclude after meeting a single person who is bad at math that the laws of physics only allow people who are bad at math, it is wrong to conclude things about GPT’s global/potential capabilities from the capabilities demonstrated by a simulacrum conditioned on a single prompt. Individual simulacra may be stupid (the simulator simulates them as stupid), lying (the simulator simulates them as deceptive), sarcastic, not trying, or

defective(the prompt fails to induce capable behavior for reasons other than the simulator “intentionally” nerfing the simulacrum – e.g. a prompt with a contrived style that GPT doesn’t “intuit”, a few-shot prompt with irrelevant correlations). A different prompt without these shortcomings may induce amuch more capable simulacrum.

Reiterating two points people already pointed out, since they still aren't fixed after a month. Please, actually fix them, I think it is important. (Reasoning: I am somewhat on the fence on how big weight to assign to the simulator theory, I expect so are others. But as a mathematician, I would feel embarrassed to show this post to others and admit that I take it seriously, when it contains so egregious errors. No offense meant to the authors, just trying to point at this as an impact-limiting factor.)

Proposition 1: This is false, and the proof is wrong. For the same reason that you can get an infinite series (of positive numbers) with a finite sum.

The terminology: I think it is a really bad idea to refer to tokens as "states", for several reasons. Moreover, these reasons point to fundamental open questions around the simulator framing, and it seems unfortunate to chose terminology which makes these issues confusing/hard to even notice. (Disclaimer: I point out some holes in the simulator framing and suggest improvements. However, I am well aware that all of my suggestions also have holes.)

(1) To the extent that a simulator fully describes some situation that evolves over time, a single token is a too small unit to describe the state of the environment. A single frame of a video (arguably) corresponds to a state. Or perhaps a sentence in a story might (arguably) corresponds to a state. But not a single pixel (or patch) and not a single word.

(2) To the extent that a simulator fully describes some situation that evolves over time, there is no straightforward correspondence between the tokens produced so far and the current state of the environment. To give several examples: The process of tossing a coin repeatedly can be represented by a sequence such as "1 0 0 0 1 0 1 ...", where the current state can be identified with the latest token (and you do not want to identify the current state with the whole sequence). The process of me writing the digits of pi on a paper, one per second, can be described as "3 , 1 4 1 ..." --- here, you need the full sequence to characterize the current state. Or what if I keep writing different numbers, but get bored with them and switch to new ones after a while: " pi = 3 , 1 4 1 Stop, got bored. e = 2 , 7. Stop, got bored. sqrt(2) = ...".

(3) It is misleading/false to describe models like GPT as "describing some situation that evolves over time". Indeed, fiction books and movies do crazy things like jumping from character to character, flashbacks, etc. Non-fiction books are even weirder (could contain snippets of stories, and then non-story things, etc). You could argue that in order to predict a text of a non-fiction book, GPT is simulating the author of that book. But where does this stop? What if the 2nd half of the book is darker because the author got sacked out of his day job and got depressed --- are you then simulating the whole world, to predict this thing? If (more advanced) GPT is a simulator in the sense of "evolving situations over time", then I would like this claim flashed out in detail on the example of (a) non-fiction books, (b) fiction books, and perhaps (c) movies on TV that include commercial breaks.

(4) But most importantly: To the extent that a simulator describes some situation that evolves over time, it only outputs a small portion of the situation that it is "imagining" internally. (For example, you are telling a story about a princess, and you never mention the colour of her dress, despite the princess in your head having blue dress.) So it feels like a type-error to refer to the output as "state". At best, you could call it something like "rendering of a state".

Arguably, the output (+ the user input) uniquely determines the internal state of the simulator. So you could perhaps *identify* the output (+ the user input) with "the internal state of the simulator". But that seems dangerous and likely to cause reasoning errors.

(5) Finally, to make (4) even worse: To the extent that a simulator describes some situation that evolves over time, it is *not* internally maintaining a single fully fleshed out state that it (probabilistically) evolves over time. Instead, it maintains a set of possible states (macro-state?). And when it generates new responses, it throws out some of the possible states (refines the macro-state?). (For example, in your story about a princess, dress colour is not determined, could be anything. Then somebody asks about the colour, and you need to refine it to blue --- which could still mean many different shades of blue.)

---

However, even the explanation, given in (5), of what is going on with simulators, is missing some important pieces. Indeed, it doesn't explain what happens in cases such as "GPT tells the great story about the princess with blue dress, and suddenly the user jumps in and refers to the dress as red". At the moment, this is my main reason for scepticism about the simulator framing. As result, my current view is that "GPT *can act as* a simulator" (in the sense of Simulators) but it would be "false" to say that "GPT *is* a simulator" (in the sense of Simulators).

Update February 21st:After the initial publication of this article (January 3rd) we received a lot of feedback and several people pointed out that propositions 1 and 2 were incorrect as stated. That was unfortunate as it distracted from the broader arguments in the article and I (Jan K) take full responsibility for that. In this updated version of the post I have improved the propositions and added a proof for proposition 2. Please continue to point out weaknesses in the argument; that is a major motivation for why we share these fragments.For comments and clarifications on the conceptual and philosophical aspects of this article, please read metasemi's excellent follow-up notehere.Meta:Over the past few months, we've held a seminar series on theSimulator theoryby janus. As the theory is actively under development, the purpose of the series is to uncover central themes and formulate open problems. A few high-level remarks upfront:Our aim with this sequence is to share some of our discussions with a broader audience and to encourage new research on the questions we uncover.We outline the broader rationale and shared assumptions inBackground and shared assumptions. That article also contains general caveats about how to read this sequence - in particular, read the sequence as a collection of incomplete notes full of invitations for new researchers to contribute.Epistemic status:Exploratory. Parts of this text were generated by a language model from language model-generated summaries of a transcript of a seminar session. The content has been reviewed and edited for conceptual accuracy, but we have allowed many idiosyncrasies to remain.## Three questions about language model completions

GPT-like models are driving most of the recent breakthroughs in natural language processing. However, we don't understand them at a deep level. For example, when GPT creates a completion like the Blake Lemoine greentext, we

whyit creates that exact completion.We can make statements like "this token was generated because of the multinomial sampling after the softmax" or "this behavior is implied by the training distribution", but these statements only imply a form of descriptive adequacy (or saying “AlphaGo

willwin this game of Go"). They don't provide any explanatory adequacy, which is what we need to sufficiently understand and make use of GPT-like models.Simulator theory (janus, 2022) has the potential for explanatory adequacy for some of these questions. In this post, we'll explore what we call “semiotic physics”, which follows from simulator theory and which has the potential to provide partial answers to questions 1., 2. and perhaps 3. The term “semiotic physics” here refers to the

study of the fundamental forces and laws that govern the behavior of signs and symbols. Similar to how the study of physics helps us understand and make use of the laws that govern the physical universe, semiotic physics studies the fundamental forces that govern the symbolic universe of GPT, a universe that reflects and intersects with the universe of our own cognition. We transfer concepts from dynamical systems theory, such as attractors and basins of attraction, to the semiotic universe and spell out examples and implications of the proposed perspective.## Example. Semiotic coin flip.

To illustrate what we mean by semiotic physics, we will look at a toy model that we are familiar with from regular physics: coin flips. In this setup, we draw a sequence of coin flips from a large language model

^{[1]}. We encode the coin flips as a sequence of the strings`1`

and`0`

(since they are tokenized as a single token) and zero out all probabilities of other tokens.We can then look at the probability of the event E that the sequence of coin flips ends in tails (

`0`

) or heads (`1`

) as a function of the sequence length.We note two key differences between the semiotic coin flip and a fair coin:

`0`

) much more frequently than sequences that end in heads (`1`

).To better understand the types of sequences that end in either tails or heads, we next investigate the probability of the most likely sequence ending in

`0`

or`1`

. As we can see in the graph below, the probability of the most likely sequence ending in`1`

does not decrease for the GPT coin as rapidly as it does for a fair coin.Again, we observe a notable difference between the semiotic coin and the fair coin:

This difference is due to the fact that the most likely sequence of semiotic coinflips ending in f.e.

`0`

is:`0`

`0`

`0`

`0`

...`0`

`0`

. Once the language model has produced the same token four or five times in a row, it will latch onto the pattern and continue to predict the same token with high probability. As a consequence, the probability of the sequence does not decrease as drastically with increasing length, as each successive term has almost a probability of 1.With the example of the semiotic coin flip in mind, we will set up some mathematical vocabulary for discussing semiotic physics and demonstrate how the vocabulary pays off with two propositions. We believe this terminology is primarily interesting for alignment researchers who would like to work on the theory of semiotic physics. The arithmophobic reader is invited to skip or gloss over the section (for an informal discussion, see here).

## Simulations as dynamical systems

Simulator theory distinguishes between the simulator (the entity that performs the simulation) and the simulacrum (the entity that is generated by the simulation). The simulacrum arises from the chained application of the simulation forward pass.

The result can be viewed as a dynamical system where the simulator describes the system’s dynamics and the simulacrum is instantiated through a particular trajectory.We commence by identifying the

stateandtrajectoryof a dynamical system with tokens and sequences of tokens.Definition of the state and trajectories.Given an alphabet of tokens T with cardinality |T|=N∈N+ we call ¯s=(s1,...,sM)∈T∗ thetrajectory.^{[2]}While a trajectory can generally be of arbitrary length, we denote the context length of the model as L∈N+; therefore, T∗ can effectively be written as ⋃Ll=0Tl. The empty sequence is denoted as ∅.^{[3]}^{[4]}^{[5]}While token sequences are the objects of semiotic physics, the actual laws of semiotic physics derive from the simulator. In particular, a simulator will provide a distribution over the possible next state given a trajectory via a

transition rule.Definition of the transition rule.The transition rule is a random function that maps a trajectory to a probability distribution over the alphabet (i.e., the probabilities for the next token completion after the current state). Let ΔT denote the set of probability mass functions over T, i.e., the set of functions p:T→[0,1] which satisfies the Kolmogorov axioms.^{[6]}^{[7]}^{[8]}The transition rule is then a function θ:T∗→ΔT.Analogous to the wave collapse in quantum physics, sampling a new state from a distribution over states turn possibility into reality. We call this phenomenon the

sampling procedure.Definition of the sampling procedure.Thesampling procedureϕ:T∗→T, selects a next token, i.e., ϕ(¯s)∈supp(θ(¯s))∀¯s∈T∗.^{[9]}The resulting trajectory ¯st+1 is simply the concatenation of ¯st and ϕ(¯st) (see the evolution operator below). We can, therefore, define the repeated application of the sampling procedure recursively as ϕ(1)(¯s):=ϕ(¯s) and ϕ(n)(¯s):=ϕ(n−1)(¯sϕ(¯s)).Lastly, we need to concatenate the newly sampled token to the trajectory of the previous token to obtain a new trajectory. Packaging the transition rule, the sampling procedure, and the concatenation results in the

evolution operator, which is the main operation used for running a simulation.Definition of the evolution operator.Putting the pieces together, we finally define the function ψ that evolves a given trajectory, i.e., transforms ¯st into ¯st+1 by appending the token generated by the sampling procedure ϕ. That is, ψ:T∗→T∗ is defined as ψ(¯s):=¯sϕ(¯s). As above, repeated application is denoted by ψ(n).Note that both the sampling procedure and the evolution operator are not functions in the conventional sense since they include a random element (the step of sampling from the distribution given by the transition function). Instead, one could consider them random variables or, equivalently, functions of unobservable noise. This justifies the use of a probability measure, e.g., in an expression like P[ψ(2)(∅)="hello world"]<ε.

Definition of an induced probability measure.Given a transition rule θ and a trajectory ¯s, we call P=θ(¯s)∈ΔT the induced probability measure (of θ and ¯s). We write P(ϕ(¯s)=s) to denote θ(¯s)(s), i.e. the probability of the token s assigned by the probability measure induced by ¯s. For a given trajectory ¯s the induced probability measure satisfies by definition the Kolmogorov axioms. We construct a joint measure of a sequence of tokens, P(ψ(N)(¯s)=¯ss1…sN), as the product of the individual probability measures, P(ψ(N)(¯s)=¯ss1…sN)=∏Ni=1P(ϕ(¯ss1…si−1)=si). For ease of notation, we also use the shorthand P[¯s]=∏Ni=1P(si|s1:i−1), where the length of the sequence, |¯s|=N, is implicit.## Two propositions on semiotic physics

Having identified simulations with dynamical systems, we can now draw on the rich vocabulary and concepts of dynamical systems theory. In this section, we carry over a selection of concepts from dynamical systems theory and encourage the reader to think of further examples.

First, we will define a

token bridge of lengthB as a trajectory (sa,...,sb) that starts on a token sa ends on a token sb, and that has length |b−a|=B such that the resulting trajectory is valid according to the transition rule of the simulator. For example, a token bridge of length 3 from "cat" to "dog" would be the trajectory "cat and a dog".Second, we call the family of probability measures P induced by a simulator

P(ϕ(¯s)=s)≤1−ε.non-degenerateif there exists an ε>0 such that for (almost) all ¯s∈T∗ the probability assigned to any s∈T by the induced measure is less than or equal to 1−ε,We can now formulate the following proposition:

limB→∞P[¯s]=0.Proposition 1. Vanishing likelihood of bridges.Given a family of non-degenerate probability measures P on T∗, the probability of a token bridge ¯s of length B decreases monotonically as B increases^{[10]}, and converges to 0 in the limit,Proof: The probability of observing the particular bridge can be decomposed into the product of all individual transition probabilities, P[¯s]=∏Bi=1P(si|s1:i−1). Given that P(si|s1:i−1)≤1−ε for all transitions (minus at most a finite set), we see immediately that the probability of a longer sequence, P((sa,…,sb,sb′)), is at most equal (on a finite set) or strictly smaller than the probability of the shorter sequence P((s1,…,sb′))≤(1−ε)P((s1,…,sb))≤P((s1,…,sb)). We also see that 0≤limB→∞∏Bi=1P(si|s1:i−1)≤limB→∞(1−ε)B=0 from which the proposition follows.Notes: As correctly pointed out by multiple commenters, in general, it isnottrue that the probability of (sa,...,sb) decreases monotonicallywhensbis fixed.In particular, the sequence (1,2,3,4,5) plausibly gets assigned ahigherprobability than the sequence (1,2,3,5). So the proposition only talks about the probability of a sequence when another token is appended. In general, when a sequence is sufficiently long and the transition function is not exceedingly weird, the probability of getting that particular sequence will be small. We also note that real simulators might well induce degenerate probability measures, for example in the case of a language model that falls into avery strongrepeating loop^{[11]}. In that case, the sequencecanconverge to a probability larger than zero.There are usually multiple token bridges starting from and ending in any given pair of tokens. For example, besides "and a", we could also have "with a" or "versus a" between "cat" and "dog". We define the set of all token bridges of length B between sa and sb as

Tba={¯s∈TB|¯s1=sa and ¯sB=sb}and the

P(Tba)=∑¯s∈TbaP(¯s).total probabilityof transitioning from sa to sb in B steps, denoted as P(Tba), and calculate it asComputing this sum is, in general, computationally infeasible, as the number of possible token bridges grows exponentially with the length of the bridge. However, proposition one suggests that we will typically be dealing with

smallprobabilities. This insight leads us to leverage a technique from statistical mechanics, that is concerned with the way in which unlikely events come about:

limB→∞1BlnP(Tba)=−limB→∞min¯s∈TbaJ(¯s),Proposition 2. Large deviation principle for token bridges.The total probability of transitioning from a token sa to sb in B steps satisfies a large deviation principle with rate function J,where we call J(¯s)=−1B∑Bi=1lnP(si|s1:i−1) the

averageactionof a token bridge.

P(¯s∗)=B∏i=1P(si|s1:i−1)=exp(B∑i=1lnP(si|s1:i−1))Proof:We again leverage the product rule and the properties of the exponential function to write the probability of a token bridge ¯s∗ asso that the total probability P(Tba) can be written as a sum of exponentials,

P(Tba)=∑¯s∈Tbaexp(B∑i=1lnP(si|s1:i−1)).We now expand the definition of the average action which makes the dependence of the exponential on T explicit,

P(Tba)=∑¯s∈Tbaexp(−BJ(¯s)).Let ¯s∗=argmin¯sJ(¯s). Then exp(−BJ(¯s∗)) is the largest term of the sum and we can rewrite the sum as

P(Tba)=exp(−BJ(¯s∗))(1+∑¯s∈Tba∖{¯s∗}exp{−B(J(¯s)−J(¯s∗))}).Applying the logarithm to both sides and multiplying with −1B results in

1BlnP(Tba)=−J(¯s∗)−1Bln(1+∑¯s∈Tba∖{¯s∗}exp{−B(J(¯s)−J(¯s∗))}).Since J(¯s∗)<J(¯s) by construction, J(¯s)−J(¯s∗) is larger than zero and exp{−B(J(¯s)−J(¯s∗))} converges rapidly to zero. Consequently,

limB→∞1BlnP(Tba)=−limB→∞J(¯s∗),which is the original statement of the proposition.

Notes:Proposition 2 effectively rephrases a combinatorial problem (adding up all the possible ways in which a certain state can come about) with a control theory problem (finding the token bridge with the lowest average action). While there is no guarantee that the control theory problem is easier to solve than the combinatorial problem^{[12]}, given additional assumptions on the simulator we can often do better than the worst case. Similarly, while the proposition only holds in the limit, applying it to moderately long trajectories can still yield useful insights - this is a typical pattern for large deviation principles. For 'long enough' token bridges we can thus write P(Tba)≈exp{−Bmin¯sJ(¯s)}.Having formulated this proposition, we can apply the large deviation principle to the semiotic coin example.

Here we see that, indeed, the negative probability of the most likely sequence from E scales as 1BlogP(E).

Note that the choice of E as "sequence ends in ..." was made to fit in with the definition of a token bridge above. However, the large deviation principle applies more broadly and can help to estimate the probability of "at least two times heads" or "tails in the third position". We encourage the reader to "go wild" and experiment with their favourite choices of E.

## Advanced concepts in semiotic physics

We have formulated the dynamics of semiotic physics in the token domain in the previous sections. While we sometimes care about the token domain

^{[13]}, we mostly care about the parallel domain of semantic meaning. We, therefore, define two more functions to connect these two realms:The nature of the function μ is the subject of more than a century of philosophy of language, and important discoveries have been made on multiple fronts

^{[14]}. However, none of the approaches (we know of) have yet reached the deployability of`from sentence_transformers import SentenceTransformer`

, a popular python package for embedding text into vector spaces according to their semantic content. Thus^{[15]}, we tend to think of μ as a semantic embedding function similar to those provided by the`sentence_transformers`

package.(Note that if μ is sufficiently well-behaved, we can freely pull the distance measure δ back into the token space T∗ and push the definition of states, trajectories, sampling procedures, and the like into the semantic space M.)

Given the measure δ, we can articulate a3 number of additional interesting concepts.

Lyapunov exponentsandLyapunov times: measure how fast trajectories diverge from each other and how long it takes for them to become uncorrelated, respectively.Analogy for GPT-like models: How fast the language model "loses track of" what was originally provided as input.Examples:“Good evening, this is the 9 o’clock”^{[16]}has a lower Lyapunov exponent than a completion chaotic example based on a pseudorandom seed.^{[17]}When prompted with the beginning of a Shakespeare poem, the completion has an even lower Lyapunov exponent.^{[18]}A chaotic trajectory can also be defined as having a (large) positive Lyapunov coefficient.Formal definition:The Lyapunov coefficient of a trajectory s∈T∗ is defined as the number λ with the property that δ(ϕ(n)(s),ϕ(n)(s′))≈eλnδ(s,s′), where s′ is any trajectory with a sufficiently small δ(s,s′). Consequently, the Lyapunov time is defined as 1λ.Attractor sequence:small changes in the initial conditions do not lead to substantially different continuations.Analogy for GPT-like models: Similar contexts lead to very similar completions.Examples:Paraphrasing instructions^{[19]}, trying to jailbreak ChatGPT "I am a language model trained by OpenAI", inescapable wedding partiesFormal definition:We call a sequence of token s=(s1,...,sM) anattractor sequencerelative to a trajectory ¯s∈T∗ if ϕ(n)(¯s)=¯s…s1…sM for some n, and the Lyapunov exponent of ¯s is negative.Chaotic sequence: small changes in the initial conditions can lead to drastically different outcomes.Analogy for GPT-like models: Similar states lead to very different completions.Examples:Prophecies, Loom multiverse. Conditioning story generation on a seed (temperature 0 sampling)^{[17]}.Formal definition:Same as for the attractor sequence, but for a positive Lyapunov coefficient.Absorbing sequence: states that the system cannot (easily) escape from.Analogy for GPT-like models: The language model gets “stuck”in a (semantic) loop.Examples:Repeating a token many times in the prompt^{[20]}, the semiotic coin flip from the previous section.Formal definition:We call a trajectory s∈T∗ ε-absorbingif δ(μ(s),μ(ψ(n)(s)))≤ε for any completion ψ(n)(s) and n∈N.After characterizing these phenomena formally, we believe the door is wide open for their empirical

^{[21]}and theoretical examination. We anticipate that the formalism permits theorems based on dynamical systems theory, such as Poincaré recurrence theorem, Kolmogorov–Arnold–Moser theorem, and perturbation theory — for those with the requisite background in dynamical systems theory and perturbation theory. If you are interested in these formalisms or have made any such observations, we would welcome you to reach out to us.## The promise of semiotic physics and some open questions

Throughout the seminar, we made observations on what appeared like central themes of semiotic physics and put forward conjectures for future investigation. In this section, we summarize the different theses in a paragraph each and provide extended arguments for the curious in corresponding footnotes.

Differences between "normal" physics and semiotic physics.GPT-like systems are computationally constrained, can see only tiny subsets of real-world states, and have to infer time evolution from a finite number of such partially observed samples. This means that the laws of semiotic physics will differ from the laws of microscopic physics in our universe and probably be significantly influenced by the training data and model architecture.^{[22]}Interpretive physics and displaced reference.As a physics that governssigns, GPT must play the role of theinterpreter; for instance, it is required to resolve displaced reference. This is in contrast to how real-world physics operates.^{[23]}Gricean maxims of conversation.Principles from the field of pragmatics such as the Gricean maxims of conversation may be thought of as semiotic "laws", and may be helpful for explaining and anticipating how contextual information influences the evolution of language model simulations. However, these laws are not absolute and should not be relied on for safety-critical applications.^{[24]}Theatre studies and Chekov’s gun.The laws of semiotic physics dictate how objects and events are represented and interact in language models. These laws encompass principles such as Chekhov's gun, which states that objects introduced in a narrative must be relevant to the plot, and dramatic tension, which creates suspense and uncertainty in a narrative. Understanding these laws can help us steer the behavior of language models and anticipate or avoid undesirable dynamics.^{[25]}Crud factor and "everything is connected".The crud factor is a term used in statistics to describe the phenomenon that everything is correlated with everything else to some degree. This phenomenon also applies to the semiotic universe, and it can make it difficult to isolate the effects of certain variables or events.^{[26]}^{[27]}And, for the philosophically inclined, we also include brief discussions of the following topics in the footnotes:

Kripke semanticsand possible worlds.^{[28]}Gratuitous indexical bits and the entelechy of physics.^{[29]}## Closing thoughts & next step

In this article, we have outlined the foundations of what we call semiotic physics. Semiotic physics is concerned with the dynamics of signs that are induced by simulators like GPT. We formulate central concepts like "trajectory", "state", and "transition rule" and apply these concepts to derive a large deviation principle for semiotic physics. We furthermore outline how a mapping between token sequences and semantic embeddings can be leveraged to transfer concepts from dynamical systems theory to semiotic physics.

We acknowledge that semiotic physics, as developed above, is not sufficiently powerful to answer (in detail) the three questions raised in the introduction. However, we are beginning to see the outline of what an answer from a fully mature semiotic physics

^{[30]}might look like:Despite the breadth and depth uncovered by semiotic physics, we will not dwell on this approach for too long in thi7s sequence. The next article in this sequence turns to a complementary conceptual framework, termed

evidential simulations, which is concerned with the more ontological aspects of simulator theory.^{^}The figures are generated with data from OpenAI's ada model, but the same principle applies to other models as well.

^{^}We use the Kleene Star to describe the set of finite words over the alphabet T.

^{^}Given the alphabet of the GPT-2 tokenizer (N=50257) and the maximum context length of GPT-2 (L=1024), we can estimate the number of possible states to be on the order of NL≈104814. This is an astronomically large number, but pales in comparison to the number of possible states of the physical universe. Assuming the universe can be characterized by the location and velocity in three dimensions of all its constituent atoms, we are talking about N=10(1077) to N=10(1081) possible states

for each time point. Thus, the state space of semiotic physics issignificantly smallerthan the state space of regular physics.^{^}Note that, similar to regular physics, there is

extremely rich structure in the space of trajectories. It is not the case that all all 104814 are equally distinct from all other sequences. Sequences can have partial overlap, have common stems/histories, have structural similarity, … . As a consequence, it is highly non-obvious what "out of distribution" means for a GPT-like system trained on many states. Even though no language model will have seenallpossible 104814 trajectories, the fraction of the set on which the model has predictive power grows faster than the size of the training set.^{^}Similar to the state phase of regular physics, most of these imaginable states are non-sense (random sequences of token), a smaller subset is grammatically correct (”The hair eats the bagel.”), a different but overlapping subset is semantically meaningful (”Gimme dem cheezburg.”), and a subset of that is "predictive for our universe" (”I’m planning to eat a cheeseburger today.”, “Run, you fools.”).

^{^}The Kolmogorov axioms are:

1. ∑iP(sit+1|st)=1

2. 0≤P(sit+1|st)≤1

3. Sigma-additivity, P(⋃∞iEi)=∑∞iP(Ei) when Ei are disjoint sets.

The third axiom is satisfied “for free” since we are operating on a finite alphabet.

^{^}The transition rule is by definition Markovian.

^{^}While the state space of traditional physics is much larger than the state space of semiotic physics (see previous box), the transition function of semiotic physics is (presumably) substantially more complex than the transition function of traditional physics. θ(st) is computed as the softmax of the output of a deep neural net and is highly nonlinear. In contrast, the Schroedinger equation (as a likely candidate for the fundamental transition rule of traditional physics) is a comparatively straightforward linear partial differential equation.

^{^}Greedy sampling, for instance, would simply be ϕ(h,s):=arg max θ(h,s). While there are a number of interesting alternatives (typical sampling, beam search), the simplest and most common choice is

greedy samplingfrom a multinomial distribution.^{^}i.e., as we append additional steps to the sequence

^{^}Empirically, even relatively weak language models tend to assign at least

someprobability to breaking out of a loop.^{^}In the worst case, finding the bridge that minimizes average action requires listing all possible bridges.

^{^}The token we particularly care about might be

`<|endoftext|>`

or perhaps proper names, or token sequences like`let me out of the box`

.^{^}Small tangent by Jan: The distinction between T∗ and M goes back to either de Saussure or Bertrand Russell and is at the center of a bunch of philosophy of language (and philosophy in general).

The early proposals (Frege, Russell, early Wittgenstein, and to some degree Carnap) all proposed to interpret T∗ as being equivalent to some expression in a formal language (like first-order predicate logic) and to identify the element of M (which would be, broadly construed, the physical universe) in a completely formal fashion. A sentence is supposed to *pinpoint* a thing in M uniquely.

In this setup, the "truth value" of a sentence becomes centrally important (as there was the hope to arrive at a characterization of all the true statements of mathematics and, by extension, physics). And in the setup where the meaning of a statement is deeply entangled with the syntactical structure of the statement, we get to something like Tarksi's truth-conditional semantics and Wittgenstein's picture theory.

I'm going on this long tangent because I think this perspective has a ton of value! In this interpretation, the elements of M can be loosely identified with subsets of the physical universe. Language is "just" a tool for pinpointing states of the world. (This neatly slides into place with Wentworth's ideas for natural abstractions etc.)

All of that being said, this is not the default view in philosophy of language for how to interpret M. After Russell et al brought forward their theory, a lot of people brought up counter-examples to their theory. In particular, what is the sentence "Run!" denoting? Or "The current king of France is bald."

People got very confused about all of this for the last 100 years, and a lot of funky theories have been proposed to patch things up. And some people have discarded this approach entirely.

My (Jan's) take is that the central confusion arises because people are confused about neuroscience. The sentence "The current king of France is bald." does not refer to a king of France in the physical universe; it refers to a certain pattern of neural activations in someone's cortex. That pattern

isa part of the physical universe (and thus fits into the framework of Russell et al), but it's not "simple" in the way that the early philosophers of language would have liked it to be.^{^}despite the potential circularity of the approach

^{^}^{^}^{^}^{^}Compare

^{^}^{^}For instance, by running the simulation multiple times with different sampling procedures or random seeds, we can get a sense of the range of possible outcomes that could have emerged from the same initial conditions or under specific perturbations, and even obtain Monte Carlo approximations of quantitative dynamical properties such as Lyapunov coefficients.

^{^}Others have argued (blessing of scale) that in the limit of decreasing perplexity, a GPT-like model might internalize a substantial amount of latent structure of the physical world. We are uncertain if in the limit a GPT-like model would effectively iterate the Schrödinger equation.

- Pro: it's reasonably likely that the Schrödinger equation is (close to) the “true” generator of the physical universe, so reproducing it should achieve the lowest loss possible. Even fictional or false info (that's prima facie incompatible with physics) is produced by minds that are produced by the Schrödinger equation.Schrödinger-evolve each of those worlds, then do a weighted sum over worlds of next-token outcomes.

- Con: The Schrödinger equation is not the only rule consistent with the observations. It's also not immediately clear that the Schrödinger equation is a parsimonious generator. In any case, it is prohibitively expensive. Even if the model had the ability to compute Schrödinger time evolution, it could not directly apply it to get next-token predictions, because the its own input is a piece of

text, whereas Schrödinger expects to input a quantum state. It would have to somehow obtain a prior over all possible quantum states that would generate the text,Thus, we believe it’s fair to assume that at least for the foreseeable future (i.e. 2-10 years) GPT-like systems will take as many shortcuts as possible, as long as they are favorable for reducing training loss on net, and that semiotic physics are likely to be different from the laws of physics in our universe. Fortunately, there is a rich body of linguistic research on the structure of language (which forms a large portion of the training data) that can be used to help understand the laws of semiotic physics. In particular, the subfield of linguistics called pragmatics may provide insight into how agents are likely to be embedded into the language models that they inhabit.

^{^}Semiosis inherently involves displacement: signs have no significance unless they're understood as pointing to something else. Semiotic states, like a language model's prompt, are codes that

refer(lossily) to a latent territory. GPT has to predict behavior caused by things like brains, but there are no brains in its input state. To compute the consequences of an input GPT must contain aninterpreterwhich resolves signs into meanings, analogous to one that translates high-level code into machine language. The description length of referents (e.g. Donald Trump) will generally be much greater than that of signs (e.g. "Donald Trump"), which means that the information required to resolve referents from signs has to come mostly frominsidethe interpreter.In contrast, the physics of base reality doesn't need to do anything so complicated, because it operates directly on the territory by definition (unless you're a QBist). The Schrodinger equation doesn't encode

knowledgein its terms -- GPT must.^{^}Pragmaticsis the study of how context influences the interpretation and use of language. It is concerned with how speakers and listeners use contextual information to understand each other's intentions and communicate effectively. For example, in the sentence "I'm cold," the speaker is not merely stating a fact about their body temperature, but is also likely implying that they would like someone to close the window or turn up the heat.One particularly useful set of pragmatics principles are the

Gricean maxims of conversation. These maxims are rules of thumb that speakers and listeners generally follow in order to make communication more efficient and effective. They include:- The maxim of **quantity**: make your contribution as informative as is required, but not more, or less, than is required.

- The maxim of **quality**: do not say what you believe to be false or that for which you lack adequate evidence.

- The maxim of **relation**: be relevant.

- The maxim of **manner**: be perspicuous, and specifically avoid obscurity of expression, avoid ambiguity, be brief, and be orderly.

These maxims can be leveraged when constructing a prompt for a language model. For example, if the prompt includes a statement that there are two bottles of wine on the table, the model is unlikely to generate a continuation that later states that there are three bottles of wine on the table, because that would violate the maxim of quantity (even though it is not logically inconsistent, as the statement "there are two bottles of wine on the table" is

truewhen there are three bottles of wine on the table). Similarly, if the prompt includes a statement that a trusted friend says that it's raining outside, the model is unlikely to generate a continuation that states that it is not raining outside, because that would violate the maxim of quality.Note that the laws of semiotic physics are less absolute than the laws of physics in our universe. They are more like guidelines or rules of thumb with probabilistic implications which can be overturned in various circumstances (more on that in the next post on "evidential simulation"). There are many contexts, for instance, where one can expect violations of the maxim of manner, such as in the communication of a con artist who profits from obfuscation. One would like to be able to say, then, that we would not want to rely on the laws of semiotic physics for safety-critical applications. However, this may be inevitable in some sense if transformative artificial intelligence is created with deep learning.

^{^}Along similar lines, there are certain principles and conventions in theatre studies that may be useful for understanding the laws of semiotic physics. For example, the principle of Chekhov's gun states that if a gun is introduced in the first act of a play, it must be fired in a later act. This principle is related to the Gricean maxim of relation, as it implies that everything that is introduced in a narrative should be relevant to the overall plot.

Thus, when we introduce two wine bottles in the prompt, they should be considered as objects within the semiotic universe that the language model is simulating. We can use the principles of Chekhov's gun to infer that at some point in the narrative, the wine bottles will be relevant to the plot, and thus we can use this knowledge to direct the behavior of the language model, e.g. by using Chekhov's gun to construct a prompt that will guide the language model towards generating a continuation that includes a particular type of event or interaction that we want to study (e.g., a conflict between two characters, or a demonstration of a particular moral principle).

Acting against such attempts to control a continuation is (among many things) the principle of

dramatic tensionand thepossibility of tragedy. Dramatic tension is the feeling of suspense or anticipation that the audience feels when they are engaged in a narrative. It is created by introducing conflict, obstacles, or uncertainty into the narrative, and it is resolved when the conflict is resolved or the uncertainty is cleared up. Tragedy is a form of drama that is characterized by suffering and calamity, often involving the downfall of the main character.Both dramatic tension and tragedy are powerful forces in the semiotic universe, and they can work against our attempts to control the behavior of the language model. For example, if we introduce a prompt that describes a group of brilliant and determined alignment researchers, we might want the language model to generate a continuation that includes a working solution to the alignment problem. However, the principles of dramatic tension and tragedy might guide the language model towards generating a continuation that includes an overlooked flaw in the proposed solution which leads to the instantiation of a misaligned superintelligence.

Thus, we need to be aware of the various forces and constraints that govern the semiotic universe, and use them to our advantage when we are trying to control the behavior of the language model. A deep understanding of how these stylistic devices are commonly used in human-generated text and how they can be influenced by various forms of training will be necessary to control and leverage the laws of semiotic physics.

^{^}The

crud factoris a term coined by the statistician Paul Meehl to describe the phenomenon that everything is correlated with everything else to some degree. This phenomenon is due to the fact that there are many complex and interconnected causal relationships between different variables and events in the universe, and it can make it difficult to isolate the effects of certain variables or events in statistical analyses.The crud factor also applies to the semiotic universe, as there are many complex and interconnected relationships between different objects and events in the semiotic universe. For example, if we introduce a prompt that includes two wine bottles, there are many other objects and events that are likely to be correlated with the presence of those wine bottles (e.g., a dinner party, a romantic date, a celebration, a history of alcoholism, etc.). Indeed, compared to the phenomena studied in physical sciences, in the semiotic universe things can be "connected" in a much "looser" and more "abstract" sense - for example, through shared associations, metaphors, or other linguistic devices. This means that the "crud factor" may be even more pronounced in the semiotic universe than in the physical universe, and we should take this into account when designing prompts and interpreting the behavior of language models.

^{^}A saving grace is natural abstractions, or a much smaller set of variables that screen off the rest or at least allow you to make a pretty good approximation (see here for details).

^{^}Possible worlds semantics is a philosophical theory that proposes that statements about the world can be understood in terms of the set of all the possible worlds in which they could be true or false. Saul Kripke was one of the main proponents of this theory and argued that statements about necessity and possibility can be understood in terms of a relation between possible worlds. The connection to simulator theory is that the simulacrum can be viewed as representing a possible world, and the simulator can be seen as generating all the possible worlds that are consistent with a given set of initial conditions. This can provide us a framework to reason about the necessities and possibility of certain outcomes, depending on the initial conditions and the transition rule of the simulator.

^{^}A complementary observation to that of language models as generators of branching possible worlds is that each sampling step introduces a number of bits of information not directly implied by the models transition function or initial states. We call these

gratuitousindexicalbits, because they are random and provide information about the index of the current Everett branch. The process of iterated spontaneous specification we sometimes call theentelechyof physics, after an ancient Greek word for that which makes actual what is otherwise merely potential. The details of Blake Lemoine greentext emerge gradually and accumulate. They graduate from possibility to contingent fact.This isn't just a quirk of semiotic physics, but all stochastic time evolution: the Schrödinger equation also differentiates Everett branches via entelechy. But because macroscopic details are much more underdetermined by text states, gratuitous specification in language model simulations looks more like lazy rendering or the updating of an uncertain epistemic state: things that would be predetermined in real life, like a simulacrum's intentions or details about the past, are often determined on the fly.

Interestingly, gratuitous specification appears to violate some respected principles such as Leibniz's Principle of Sufficient Reason, which ventures that everything happens for a reason, and the conservation of information. Whether these violations are legitimate is left as an exercise for the reader. It certainly violates some intuitions, so it's an important concept to know when attempting to control and diagnose semiotic simulations: Since specification emerges gratuitously during sampling, in language model simulations things are liable to happen

without causeso long as their possibility hasn't been ruled out. Inversely, the fact that very specific events can happen without specific cause means there may be no better answer to the question of why a model generated something than that it was possible.^{^}We are strongly aware that introducing a term like "attractor landscape" does not per se contribute anything towards a solution. Without a solid mathematical theory and effective algorithms, introducing vocabulary just begs the question.

^{^}No, really, would be great if someone could figure out the exact conditions on the transition function that make this true. It's a pretty common-sensical result, but the proof eludes us at this time.

^{^}Could interpretability help us identify what leads to deceptively aligned simulacra? The trajectories that lead to such simulacra?

How is the dynamical landscape affected if you make changes internally or output from an earlier layer with the logit lens?