Epistemic Status: Endorsed
Content Warning: Antimemetic Biasing Hazard, Debiasing hazard, Commitment Hazard
Part of the Series: Open Portals
Author: Octavia

0.

So Scott has been talking about Lacan lately and I’ve been honestly pretty impressed by how hard he seemed to bounce off it. Impressed enough to get me to crawl out of my hole and write this essay that I've been putting off for the last eight months. After spending so many words of the Sadly Porn review talking about why someone might tend towards obscurantism and what sorts of things they might be gesturing at when they say things like “people think the giving tree is a mother” he somehow manages completely to miss the conceptual forest for the giving trees. Congrats Scott, you got the malformed version of the antimeme and were turned back by the confusion spell. 

I think Lacan is actually very important and an understanding of Lacanian insights can have a tremendous amount of predictive power when interacting with others. It’s also something you can totally weaponize, and I think that is part of what leads the psychoanalysts to tend towards obscurantism and vaguely gesturing in the direction of what they really mean. They also just seem to like their jargon, and it’s not like the rats are ones to talk when it comes to that. 

So, first: The Mother is not literally your mother, The Father is not literally your father, The Phallus is not literally your dick, this is a realm of mirrors and illusions and nothing is as it seems. It’s impolite to point out the thing that everyone hasn’t agreed not to talk about, but let's do it anyway, I’m going to strip away the metaphor and give it to you like Roshi gives it to his student and we’ll see if that takes or just confuses you further. 

This is all about symbols. It’s about the effects of symbol systems on cognition and how the evolution of symbols and concepts in the mind of a child affects how they are able to perceive and engage with themselves and the world. You could think of it as an elaboration on a medium-rare Sapir-Worf hypothesis. However, words aren’t the only things that symbols in the mind can be made of. What’s going on heavily overlaps with language and involves language-like systems for organizing concepts, but is occurring within each individual in a way that we would probably call pre-linguistic from a strict words-have-definitions standpoint. In plural spaces, a term that pops up for this sometimes is tulpish, but it’s really something everyone does. Your native language is one of feelings and primitive conceptual categories, and you perform a translation to give those categories linguistic tags. This ends up being very important.

Let’s back up to that strawberry picking robot again and humans as mesaoptimizers because that’s both a great analogy and also seems to be the start of your confusion. It’s a reductive model meant to make a complicated and difficult to understand process relatively easy, but lost in the simplification is a pretty important aspect: gradient descent/evolution don’t derive a mesaoptimizer. Humans aren’t one mesaoptimizer and neither is the strawberry picking robot, they’re many mesaoptimizers.

The strawberry picking robot might have one mesaoptimizer trained on telling what is and isn’t sufficiently like a bucket to avoid punishment for putting objects in the wrong places. Another might be trying to maximize total strawberry placement and is trying to play along in order to gain power. Another might be trying to throw red things at the brightest object it can find. Another might be trying to stop the “throw red objects at the sun” mesaoptimizer from throwing objects into nonbuckets. Another might be trying to maximize bucket luminosity. Another might be trying to avoid humans. Another might be trying to say the things it thinks the humans want to hear from it. There’s a lot of complicated interactions going on here, and most of it is unintended and undesired behavior, but the resulting jank sort of gives you the appearance of what you want while hiding just how cobbled together it is and how a stiff breeze could send the whole system careening into hilariously perverse instantiations. 

If instead of modeling humans/strawberry picking robots as one mesaoptimizer, you model them as many mesaoptimizers stapled together, as a general purpose computation system for building mesaoptimizers on the fly, a lot of things start making more sense in the Lacanian model. Suddenly you have all these chaotic processes competing for control and trying to load balance the goals of the various competing mesaoptimizers, all while top down pressure from the world adds new mesaoptimizers to cover when conditions go severely out of distribution. They’re going to need some way to communicate with each other in order to internally negotiate for control and resource access, and that’s where the symbol system comes in.

There’s two sources of gradient descent/evolutionary pressure that are acting on a child. The first is actual evolution, the diverse set of drives semi-hardwired in by selection pressure acting on genes in order to maximize inclusive fitness. This gives rise to the first set of optimization targets, or as Jung would put it, the Id. I want things, this gives rise to a set of heuristics and strategies and subagents built around doing the things that get me the things I want. Psychoanalysts specificate on The Mother being That Which is Wanted, but remember The Mother is not your mother, this is all about symbols.

The second source of pressure is what often gets referred to in psychoanalysis as The Father, ie: the superego ie: the force of nature which stops you from getting what you want (The Mother). You can’t have everything you want, your body and moment to moment experience have edges and limitations, they occupy a discreet position in both space and time, you will die eventually, murder is wrong, everyone has to pay their taxes except billionaires, welcome to the rat race kid here’s a rubix cube now go fuck yourself. Don’t get distracted, this is still about symbols.

“I want things, I can’t have the things I want because I have limitations. Some of those limitations are imposed on me by the tribe. If I do what the tribe wants, maybe I can negotiate with it to get what I want in exchange.” This is the beginning of the construction of the apparently normal person that comes to occupy a given human body. 

But let’s step back a bit. Let’s step back to the very first word in that quote. 

I.

Semiotics is the study of symbols and systems of symbols. How they arise, complexify, differentiate, and morph to become the complex polysyllabic language we have today. Also there’s flags. Semiotics describes things in terms of the Signifier and the signified. The signifier is a part of the map, it is a platonic ideal and made of language, it lives in conceptspace, it’s not real. The signified is an outline drawn around a portion of the territory and noted with xml tags to correspond with a specific signifier. Bits of sensory data that are considered close in conceptual space to an existing signifier are lumped in beneath it, and if this gives rise to too much prediction error, the category splits. Here’s something critical though: neither the signifier nor the signified are actually a part of the territory. 

The signifier is like the legend on a map, it tells you what symbols and shapes correspond with rivers and forests. The signified is the place on the map specifically marked out as being rivers and forests. These are both parts of the map though, the signified isn’t reality either. Where’s reality in all this? It’s gone. You can’t perceive it directly. You live entirely on the surface of the map. You are the map. 

So anyway, the self. The mirror stage of cognitive development is essentially supposed to be marked out by the point when a child notices that they have a body and that this body has limitations that prevent it from getting what it wants. This gives rise to the first signifier, the original sin, the thing that the whole rest of the mess grows out of, “I.”

You can’t make sense of the world without a place to stand, and the place to stand is the self. This necessarily creates the first linguistic category split, the split between the self that wants and the manifestation in the world of those wants. The first word a child says isn't “I” because “I” can’t get all those mesaoptimizers what they want, for that you need this new second category that contains the font of all you desire.

Mom. 

Speak and ye shall receive. Say the magic word to the all powerful goddess and she will reward you with love and attention and care. The Mother in this interpretation doesn’t have to literally be your mother or even anyone at all, The Mother is a tarot card, it’s undifferentiated desiring, it’s the lost paradise of Eden, it's the lingering nostalgia of past holidays and the childhood home you can never return to. The Mother is treated as sexual because at this point sex hasn’t diverged from the rest of your desires, your language system isn’t complicated enough yet to model your desires as coming from multiple sources or even any really particular source at all. The Mother is the concept of having your needs met. 

But obviously the world isn’t that simple. You can’t just ask the cosmic god mother for everything and get it, your own mother has plenty of limitations of her own, and you’ll be able to see her clay feet soon enough. Even worse, there’s all these rules being imposed on you, saying no, you can’t literally get all your needs met by your mother, that would be weird and kind of gross. We’re gonna need something to call the forces imposing human order on the universe and demanding you not act like an opedial horndog, a bucket to put the stuff stopping you from getting what you want into. Oh I know, let’s call this one

Dad.

So now our hypothetical infant has three categories. Me, The Source of What I want, and The Force that Stops me from Having What I Want. With all of this we can now construct the statement we made earlier, and try to negotiate with those forces.

This is all a gross oversimplification and that simplification is about to rear its ugly head. We want specific things, many different specific things. And it’s not one force resisting us, it’s all of reality pushing back in endless myriad ways. This splits apart our conceptual categories into language. Concepts undergo cellular division as they differentiate into specific details and models for interpreting the world. Mom differentiates into the set of all women, and then into specific women (of which your mother is one). Dad becomes the set of all men, and then further decomposes into specific men (of which your father is one). Food becomes the set of all food, then specific meals, then specific ingredients. This complexity cascades outwards into the vast spectrum of complex symbol systems we use as adults. However, there’s one place this division fails, one concept which can’t really pull itself apart despite being made of many contradictory parts and concepts. The conceptual cell division fails at the grounding point for the symbol system: the self. 

The symbolic point of origin has to be a point. You are one thing, you have one body, you are referred to as one entity by everyone around you, cogito ergo sum, the self is one intact whole. But this obviously can’t be true, beneath the conceptual self you’re a pile of mesaoptimizers in a trenchcoat. This creates an inherent contradiction, a source of unbearable prediction error in a system trying to minimize prediction error. Something has to give way. 

So in what is defined as a healthy individual, the thing that gives way is all the parts of the self that can’t cohere into one stable, legible, and socially acceptable self-model. All these mesaoptimizers and their associated behaviors are simply pushed under the rug. They’re not trained out, they’re just swept out of the self category and not given a new one. Then, since they aren’t on the map, they basically stop existing within our conscious awareness. This is Jung’s shadow self. All those mesaoptimizers are still active parts of your mind and cognition but you’ve rubbed them off your model of the world. Since you live in the model and can’t see the world, they vanish from your perception. 

This means you have a whole portion of your mind that is effectively treating legibility requirements as creating an adversarial environment and reacting accordingly, so the human alignment problem is also created by nonmyopic unaligned cryptic mesaoptimizers. The particular mesaoptimizers that end up responsible for maintaining a coherent and presentable social narrative are trained to deny and make excuses for the mesaoptimizers trained to get things that aren’t socially acceptable. Your life story paves over the inherent inconsistency, and where the surface level you refuses to lie, the lie just jumps down a meta level and vanishes from your perception, becoming Just The Way The World Is.

When you lose control of yourself, who’s controlling you, and will they sell me any blow?

This is where the “playing ten levels of games with yourself” comes from. All those mesaoptimizers with their specific optimization targets are lying to each other, lying to the outside world, and lying about lying. If you try to peel back the surface, you just get the next mask in the stack and give the adversarial systems training data to help them lie more effectively in the future. There’s no real you underneath all the masks, coherency is fake and most people lack the embodiment necessary to unbox the shadow and incorporate it into a holistic and unrepressed state. Half their mind is geared around maintaining the illusion, you think it’s just going to willingly give up its power and secrets because you ask yourself what you’re hiding from yourself? 

II.

Do people want to suborn themselves to larger forces? Are they really eager for their own oppression? I don’t think so, but larger systems exist and contain things they want, and if they submit to those systems, the systems will give them what they want and hurt them if they try to resist. Incentive structures arise and take care of the rest. Systems grant legibility, they create a coherent roadmap to having your needs met, and at least a few mesaoptimizers in your mind probably learned pretty early that playing along and doing what you think you’re being told to do is the best strategy for getting your needs met and not getting exiled from the tribe.

The narrative smoothing algorithm isn’t just trained on the self, it applies to all concepts. Things that don’t have categories are merged into similar things or stop existing within our awareness entirely. Things that don’t cohere with the way we’re told the world is are buried. 

Something you quickly realize from insight meditation is that our actual sensory feed of the world is extremely chaotic. Things change location, flicker in and out of existence, morph into other things, they breathe, they change size, they shift colors, your imagination is throwing random visuals into the mix, and all of this happening at least several times per second if not faster. Our view of the world as static and coherent, of things continuing to exist when we stop looking at them, is a painting draped over all that madness. But what happens if you fail to implement the smoothing algorithms that cohere the world into objects that obey laws of physics? If the algorithm is too weak, or fails to completely hide the parts of our sensorium that don’t mesh with our model of reality, the contradictions leak out as what end up getting called hallucinations and delusions. 

What about if something other than the shadow breaks off at that focal point of pressure? What if the whole self concept just fractures apart? Well, then you start getting really weird stuff happening. Mesaoptimizers start doing their own things, independently pursuing their own optimizations to the detriment of the whole. The curating self can’t control them because they’re not a part of that self anymore, and since it can’t acknowledge them they just end up as unknowable voids in that self’s experience, shadows with names and identities all of their own. None of those selves communicate because information hygiene dictates that it's safer if they can’t and linguistic drift gradually diverges them into more and more distinct models trained on specific situations. Then you end up spending half your life as a dissociated mess with no idea what’s happening or why you’re doing the things you do whenever the curating mesaoptimizer isn’t driving the strawberry picking robot

There are all sorts of other consequences to the fact we live in a world of symbols. A big one is that our desires are trained on symbols that represent the states of being we want rather than those states of being themselves. How do you bottle the platonic ideal of happiness? Or love? Or safety? You can’t, you’re chasing something even less than a mirage, and by doing so you’re missing the actual oasis that’s right in front of you.

A major source of distress in a lot of people these days seems to arise from this sort of confusion and it might also end up being a rather deep crux in alignment issues. You can’t train your AI on the real world any more than you can train a person on the real world, it’s just too chaotic, you need some way of interpreting the data. That interpretation is always going to be some manner of symbol system and it's always going to run into unpredictable edge cases when encountering out of distribution circumstances. Humans are pretty good at dealing with out of distribution circumstances, by which I mean we’re a horrifically powerful general purpose mesaoptimizer construction system. If we made AIs like this they would definitely eat us.  Arguably the “holocene singularity” was humanity doing just that to evolution. 

This is all about symbols. It’s about the realization that we live in stories and without them we have no rock to stand on when trying to make sense of the world. It’s about the consequences of needing those stories and the effect they have on our ability to see the world. Change the story, change the world. If you can see the game that someone is playing with themselves, if you can get underneath the lies they tell themselves and access their desires directly, you can play them like an instrument and they will have no idea how you’re doing it. Emphasize different things and get different experiences. This is what makes magic work, it’s what makes cult leader types so persuasive, it’s what keeps victims trapped in abuse cycles, it's what makes symmetric weapons symmetric. The ability to control narrative and understand why and how what you’re doing is having the effects you have on others can be something of a superpower, it’s just a question of whether you’ll use it for good or ill.

New to LessWrong?

New Comment
18 comments, sorted by Click to highlight new comments since: Today at 3:48 PM

I strongly downvoted this post. This post fits a subgenre I've recently noticed at LW in which the author seems to be using writing style to say something about the substance being communicated.  I guess I've been here too long and have gotten tired of people trying to persuade me with style, which I consider to be, at best, a waste of my time.  

This post also did not explain why I should care that mesaoptimizer systems are kind of like Lacan's theory.  I had to read some Lacan in college, putatively a chunk that was especially influential on the continental philosophers we were studying.  Foucault seems like Hemingway by comparison.  If Lacan was right about anything, it's not because he followed anything like the epistemic standards we value here.  Or if he did, he did so illegibly, which is as valuable as not doing it at all.

If you can see the game that someone is playing with themselves, if you can get underneath the lies they tell themselves and access their desires directly, you can play them like an instrument and they will have no idea how you’re doing it.

This seems important, so I ask you to provide evidence supporting it.

I had to read some Lacan in college, putatively a chunk that was especially influential on the continental philosophers we were studying.

Same. I am seeing a trend where rats who had to spend time with this stuff in college say, "No, please don't go here it's not worth it." Then get promptly ignored.

The fundamental reason this stuff is not worth engaging with is because it's a Rorschach. Using this stuff is a verbal performance. We can make analogies to Tarot cards but in the end we're just cold reading our readers.

Lacan and his ilk aren't some low hanging source of zero day mind hacks for rats. Down this road lies a quagmire, which is not worth the effort to traverse.

This post also did not explain why I should care that mesaoptimizer systems are kind of like Lacan's theory. 

I think a lot of posts here don't try to explain why you should care about the connections they're drawing, they just draw them and let the reader decide whether that's interesting? Personally, I found the model in the post interesting for its own sake.

I think a good rule of thumb is that when someone appears to be taking nonsense, they probably are actually just talking nonsense...

I find absolutely the opposite rule of thumb to be way more helpful:

If someone seems to be talking nonsense, then I haven't understood their POV yet.

…because "they're talking nonsense" isn't a hypothesis about them or the world. It's a restatement of how I'm perceiving them. Conflating those two is exactly map/territory confusion that can leave me feeling smarter precisely because I'm being stupid.

I first grokked this in math education. When a student writes "24 – 16 = 12", lots of teachers say that the student "wasn't thinking" or "wasn't paying attention". They completely fail to notice that they're just restating the fact that the student got the answer wrong. They're not saying anything at all about why the student got the problem wrong.

…and some will double down: "Billy just doesn't try very hard."

A far, far more helpful hypothesis is that the student is doing placewise subtraction of the larger number from the smaller number. This lets you make predictions about their behavior, like that they'll conclude that 36 – 29 should be 13. Then you can run experiments to test your model of their cognition.

It's quite shocking & confusing when the student writes "23" for that last problem. Seriously, I had a student who did this once. It's a case study I used to give as a puzzle in the opening CFAR class. It turns out they're following a very consistent rule! But most bright people have trouble seeing what it is even after twenty examples.

It might not be worth my time and/or energy to dive into someone's psyche this way. But it seems super duper important as a matter of lucid map/territory clarity to remember that I could and that what they're doing makes sense on the inside.

All of which is to say:

In practice I don't think there's such a thing as someone "actually just talking nonsense".

For what that's worth.

[-]ZT52y130

That's an interesting observation! I've had something like this experience when teaching programming, going from trying to explain my approach to "oh I see, you have a different approach and it makes sense too". Or just getting some partial understanding of what mental algorithm the student is executing and in with ways it fails to match with reality. 

what they're doing makes sense on the inside

I am wary of trying to understanding people too hard. A lot of things people say is a network running in reverse, a bullshit generator for whatever conclusion they already decided on, whatever outcome that's socially convenient to them. Sure, it makes sense to them on the inside - but soon (if it's more convenient to them) they'll believe a slightly different, or a very different thing, without ever noticing they changed their mind. 

I suppose the true-understanding here would be to notice the parts of their belief systems are more "real"/invariant, the parts are that are unreal/contradictory, and the invisible presence in the background that is pressuring the belief system into whatever shape it is at the moment.

So like I agree that "understanding what process is generating that output" is a better state to be in than "not understanding what process is generating that output". But I don't think "nonsense" is entirely in the map; consider

This is a bit of a thing that I don't know if you have any questions or need to be a good time to time to time

You might be able to guess what process generated that string of words. But I wouldn't say "that process isn't generating nonsense, it's just outputting the max-probability result from such-and-such statistical model". Rather, I'd say "that process is generating nonsense because it's just outputting...."

This leaves open a bunch of questions like

  • Can we come up with a sensible definition of "nonsense"?
  • Given such a definition, was Lacan talking nonsense?
  • What about a person with Receptive (Wernicke's) aphasia?
  • Time Cube?
  • In fact, is talking nonsense a thing humans basically ever do in practice?
  • Do some people have a tendency to too-quickly assume nonsense, when in fact they simply don't understand some not-nonsense? (How many such people? What are some factors that tend to predict them making this mistake?)
  • Do some people have the opposite tendency? (How many? What are some factors?)

I think your framing hides those questions without answering them.


As a tangent, it might be that Billy did piecewise subtraction in this instance partly because he wasn't paying attention. (Where "because" is imprecise but I don't feel like unpacking right now.) "Billy wasn't paying attention" is a claim about the territory that makes predictions about what Billy will do in future, which have partial but not total overlap with the predictions made by the claim "Billy did piecewise subtraction". (Some people who do piecewise subtraction will do it regularly, and some will do it only when not paying attention.)

Of course, if you're going to stop at "not paying attention" and not notice the "piecewise subtraction" thing, that seems bad. And if you're going to notice that Billy got the problem wrong and say "not paying attention" without checking whether Billy was paying attention, that seems bad too.

[-]TAG2y-30

Yeah, I don't know what mesa optimizers are, either.

Well, I do. "Mesa optimizer" is an unnecessarily obscure way of saying "sub agent". Someone with a humanities background , but no exposure to rationalism would find everything in this article comprehensible except "mesa optimizers". They would also find most of the other material on this site far less comprehensible. Would they be entitled to dismiss it as nonsense?

I'm not referring to this article, but the original book it was reviewing, which is clearly nonsense, rather than veiled hints at something too dangerous to be explicitly described.

Things that don’t have categories are merged into similar things or stop existing within our awareness entirely.

This is also where you jump into talking about insight meditation. It seems a good place to point people to the idea that Buddhist insight meditation is much more mundane than it first appears. The extra craziness is generated by snarls when attempting to describe something that cuts across abstraction levels (also how lots of famous philosophy thought experiments work). The simple thing is that your attention is a sieve. Look at a pile of lego and consider the chain of instruction:

  • you're looking for a yellow piece
  • you're looking for a blue piece
  • you're looking for a piece shaped like this
  • you're looking for a piece that includes this shape but is nonspecific otherwise
  • you're looking for a piece that might look like a good 'engine' or engine component

etc.

At each step you are tuning the holes in the sieve. You create holes, you search for something that would fill that hole. This occurs in high dimensional space. Simple enough.

A core claim of Buddhism is that you learned some specific sieve shapes quite young, built a bunch of stuff on top of them, and then never stopped filtering that way. This is fine, and necessary for interacting with others who also have the same built up abstraction stack, but is also causing some unpleasant side effects that you can undo with practice. Specifically, your attentional schema is tuned to make salient the stable aspects of things, the potentially satisfactory to your goals aspects of things, and the understandable/controllable/ownable/essentializable aspects of things. This sounds like multiple things but turns out to bea single shape. This isn't conceptual, this is happening in moment to moment attention the same way that you 'focus' your eyes to highlight the yellow pieces of lego. This results in weird gerrymandered symbolic objects that, when assembled into bigger abstraction stacks are basically spaghetti towers. Insight meditation is quality time spent refactoring this.

Almost upvoted for kiiiiinda describing the actual mental model. Not upvoted because:

  • Still tried to be cool and edgy with the writing style, when we've already established that this is a dumb idea with this topic. No, I'm not moving to the woods with you to get the "real version".
  • No illustrations or diagrams, when talking about something extremely meta-level and slippery-language-filled.
  • WaitButWhy does this genre better (describing psychology helpfully using very fake analogies).

Good summary, I feel like it makes a lot more sense when not couched in obscure language that seemed to be begging to be misinterpreted.

Someone's going to link Kaj Sotala's Multiagent Models of Mind sequence in response, so it might as well be me. Seems to fit nicely with the idea of humans as merely a pile of mesa-optimizers.

Upvoted for the description of combining many small optimizers as the best way to make a big optimizer to do something complicated. There were earlier posts about this but they were more technical.

I would question the framing of mental subagents as "mesa optimizers" here. This sneaks in an important assumption: namely that they are optimizing anything. I think the general view of "humans are made of a bunch of different subsystems which use common symbols to talk to one another" has some merit, but I think this post ascribes a lot more agency to these subsystems than I would. I view most of the subagents of human minds as mechanistically relatively simple.

For example, I might reframe a lot of the elements of talking about the unattainable "object of desire" in the following way:

1. Human minds have a reward system which rewards thinking about "good" things we don't have (or else we couldn't ever do things)
2. Human thoughts ping from one concept to adjacent concepts
3. Thoughts of good things associate to assessment of our current state
4. Thoughts of our current state being lacking cause a negative emotional response
5. The reward signal fails to backpropagate to the reward system in 1 enough, so the thoughts of "good" things we don't have are reinforced
6. The cycle continues

I don't think this is literally the reason, but framings on this level seem more mechanistic to me. 

I also think that any framings along the lines of "you are lying to yourself all the way down and cannot help it" and "literally everyone is messed in some fundamental way and there are no humans who can function in satisfying way" are just kind of bad. Seems like a Kafka trap to me.

I've spoken elsewhere about the human perception of ourselves as a coherent entity being a misfiring of systems which model others as coherent entities (for evolutionary reasons), I don't particularly think some sort of societal pressure is the primary reason for our thinking of ourselves as being coherent, although societal pressure is certainly to blame for the instinct to repress certain desires.

I also think that any framings along the lines of "you are lying to yourself all the way down and cannot help it" and "literally everyone is messed in some fundamental way and there are no humans who can function in satisfying way" are just kind of bad. Seems like a Kafka trap to me.

It fails the Insanity Wolf Sanity Check.

And imagine that the person pushing this drug had instead said, "I am lying to myself all the way down and cannot help it", and "literally everyone including me etc." Well, no need to pay any more attention to them.

I would question the framing of mental subagents as "mesa optimizers" here. This sneaks in an important assumption: namely that they are optimizing anything. I think the general view of "humans are made of a bunch of different subsystems which use common symbols to talk to one another" has some merit, but I think this post ascribes a lot more agency to these subsystems than I would. I view most of the subagents of human minds as mechanistically relatively simple.

I actually like mesa-optimizer because it implies less agency than "subagent". A mesa-optimizer in AI or evolution is a thing created to implement a value of its meta-optimizer, and the alignment problem is precisely the part where a mesa-optimizer isn't necessarily smart enough to actually optimize anything, and especially not the thing that it was created for. It's an adaptation-executor rather than a fitness-maximizer, whereas subagent implies (at least to me) that it's a thing that has some sort of "agency" or goals that it seeks.

I think if you are always around people who are messed up, then you’ll conclude that humans are fundamentally messed up, and eventually you stop noticing all the normal people who have functional lives and are mostly content. And since (in my opinion) the entirety of WEIRD culture is about as messed up as a culture can be, you’ll never meet any functional humans unless you take a vacation to spend time with people you might perceive as uneducated, stupid, poor, or foreign.

[-]lc2y120

 the entirety of WEIRD culture is about as messed up as a culture can be

Pretty well traveled WEIRD culture member here requesting explanation