Alignment researcher Paul F. Christiano has written several posts on what he refers to as Iterated Distillation and Amplification (IDA). In this post, I will argue that IDA is a general method of adaptation and that it can be found in various different guises in a wide range of contexts. Its inherent flexibility mirrors its inherent lack of controllability.

Summary: The point of IDA is to train an AI the same way society trains individuals. What we are hoping to do is to make an AI extract the scaffold of human morality and internalize human moral boundaries. This project might, unfortunately, be doomed to failure: for an AI to be able to adapt to moral norms over time, it must also be capable of breaking them. If it is able to break moral norms, alignment cannot be guaranteed. Therefore, IDA might not be a viable path to AI alignment.

Apollonians versus Dionysians

In science the Apollonian tends to develop established lines to perfection, while the Dionysian rather relies on intuition and is more likely to open new, unexpected alleys for research.

So writes Hungarian biochemist Albert Szent-Györgyi in a 1972 letter to Science magazine[1]. 35 years earlier, he had won the Nobel Prize in Physiology or Medicine. To his eyes, he had his Dionysian nature to thank: they are the only ones capable of finding something new.

It's fairly obvious when you think about it. If you are spending your time developing what is already known, you're not out there exploring the vast unknown. You are doing something safe and sensible instead; cleaning up the mess left behind by past Dionysians.

What the Apollonian is doing can be considered distillation or compression. He exploits conventional knowledge. What the Dionysian is doing can be thought of as amplification or learning. He attempts to go beyond conventional knowledge.

Christiano's dual-phase model is, in fact, very similar in nature to a whole bunch of other ones. And if we take a look at these we might get a better sense of what IDA is really all about (filtered, of course, through my limited understanding).

But first:

Edgework and Nodework

What's the deal with edgy behavior, really? The wording implies the answer: it's about edges. More specifically, it's about finding them. To the people who already know where the edges (of acceptable behavior) are, edgy people are incredibly annoying. They break rules and conventions and thus throw everything into chaos. In short: they make a mess and decent people will be the ones who has to clean it all up.

Borderline Personality Disorder (BPD) is edgy. That's basically the description of it: people with BPG are edgy. And it seems like they can't help themselves. They take a bunch of unnecessary risks and parents, psychiatrists, psychologists, and every other person with a vaguely Apollonian role that has to deal with them agree that they pretty much suck. They'll help them, of course, but dealing with them is really a pain in the ass.

In literary circles, they have a euphemism for edginess: transgression. Transgressive fiction is all about characters breaking the rules and conventions that society has deemed proper. Fight Club is a notable work of transgressive fiction. All decent and respectable people agree that transgressive fiction is trash. After all, it encourages edgy people and anything that encourages edgy people is trashy.

Anthropologists have been thinking about edginess for a long time. The term liminality is used to refer to a transient state of edginess that accompanies rites of passage. It's a sort of collectively agreed-upon state of anarchy where you can be as edgy as you'd like because rules and conventions don't apply. Of course, you'll have to be decent and proper afterwards, but in the liminal phase you are free to go crazy.

Psychedelics make you edgy. According to a paper by Robin Carhart-Harris and Karl Friston, psychedelics induce a liminal phase in your brain where normal rules no longer apply[2]. They refer to this as belief relaxation. Your brain, normally a sober Apollonian, suddenly turns into a wild Dionysian. Different parts of your brain starts chatting. Normally, they operate in neat clicks or modules. You can compare it to a sort of organization. Different teams keep to themselves but once in a while there are parties and they talk to each other in a non-click non-module manner. Rules that are tightly upheld in normal times, the status quo, don't apply at parties. This is a liminal phase. You can be edgy.

Robin Carhart-Harris has suggested, as have many other scientists working on the topic of psychedelics, that psychedelic substances were banned primarily because they made people too edgy. Support for the Vietnam war was a societal exercise in norm compliance, and psychedelics seemed to make the youth break rank. Richard Nixon went on the defensive to protect the fabric of society: its moral scaffolding. The subtitle of Carhart-Harris and Friston's joint paper was "the anarchic brain". And what is anarchism, if not the flagrant display of Dionysian tendencies?

We often think about liminal, edgy phases as opportunities to blow off steam. To relax. Stress builds up when we are being sober. We need to be able to release tensions that have built up during this time. And it's convenient to do it together. Because it would be wildly annoying if people kept being edgy while we're being decent. Society therefore organizes itself such that we cycle through Apollonian and Dionysian modes of being in sync. At least that's what many anthropologists seem to be saying.

The internet has sort of ruined edginess. The point of a party is that people act edgy together and then they pretend it never happened. The internet doesn't forget. Traces of edginess are detected and broadcasted for all to see by sober-minded Apollonians.

The rise of authoritarianism in the world might actually be a consequence of social media. I'm sure this is a controversial opinion. But we're all aware of this rise and there seems to be a general consensus that social media is somehow to blame. People are being exposed to more edginess from out-group members than they can handle. They cope with this edginess by isolating themselves in echo chambers. They become infatuated with leaders who tell their in-group subjects that they are going to deal with those edgy out-group members. They form movements, organizations, and occasionally cult-like tribes where in-group traits are considered Basic Human Decency and out-group traits are considered Abhorrent Moral Failure.

All because the internet doesn't forget.

In ant colonies, perceived viral threats result in the formation of clusters. Modularization is a pretty good strategy. If one subgroup gets infected, that doesn't mean the colony is doomed. Psychologist Mark Schaller has proposed the notion of a behavioral immune system that works the same in people as in ants[3]. Individuals who seem to be infected (they look or act strangely) are isolated and quarantined. They are socially shunned, blacklisted, and cancelled. All to prevent the spread of infections, ideological or otherwise. Of course, there will be some false positives. Perhaps that guy over there is just different in a way that doesn't pose an existential threat to the social group at large. Well, it's better to be safe than sorry. They'll have to settle for the comforting fact that the social group at large stays fit and healthy.

Getting the word out that Bob is strange in a way that isn't scary and dangerous takes time and energy, which is why social change tends to come about quite slowly. Some, like literary theorist Terry Eagleton, would argue that popular entertainment serves as a tool to provide software (wetware?) patches to masses[4]. The trendsetters have been talking and they've figured out that Bob isn't all that bad--just look at how sympathetic he appears in this story. As a Marxist, Eagleton would further add that low-brow entertainment is how the lower classes are indoctrinated by the middle classes, and high-brow entertainment (and education) is how the middle classes are indoctrinated by the higher classes.

Scott Alexander has written about classes and fashion in a way I feel accords well with the above.[5] [6]

Moving on:

Edgework is performed to explore one's environment for hidden information. The point is to leave behind rules and conventions in order to see whether the environment has changed in a way that matters. It feels good to perform edgework when you are bored.

Nodework is performed to exploit known information in one's environment. The point is to follow rules and conventions in order to accomplish predetermined tasks. It feels good to perform nodework when you are anxious.

Listening to simple music you've heard a bunch of times before? Watching a television show you've seen several times? That's nodework.

Taking a random trip to a place you've never been before? Talking to a stranger? Grabbing a random book from the library? That's edgework.

The term edgework, originally a term from sociology to describe risk-taking behavior[7], was recently used used to describe differences in how people update their knowledge networks[8]. Networks consist of nodes (units) and edges (links). It was found that there are two primary strategies. The hunter (equivalent to our Apollonian) develops a tight-knit cluster of nodes. The busybody (our Dionysian) makes seemingly random leaps across networks.

They determined that the strategy of the hunter, nodework, was associated with high deprivation sensitivity: a strong tendency/desire to eliminate uncertainty or surprises. The busybody, on the other hand, showcased a low deprivation sensitivity. In other words: their strategies were associated with the extent to which they could tolerate ambiguity.

We could say that nodework is like running a program. Edgework is like checking for updates. This analogy partly explains why I believe IDA can't ensure AI alignment. An AI that doesn't perform edgework will be susceptible to vulnerabilities. If nodework is all you do, you're going to be incredibly predictable. And if you're incredibly predictable, you can be easily exploited. An AI would have to perform edgework in order for it not to get exploited by a competing AI. But this process of edgework could also result in the AI abandoning the program we are trying to make sure it runs: our moral scaffolding.

Scott Alexander has written on the possibility that part of the reason why our brains are so complicated and messy is because of the need to prevent parasite manipulation[9]. It's one big arms race.

That's the double-edged sword of robustness. In order to prevent a system from getting exploited, we would seemingly have to surrender our ability to control it. We would have to provide it with the means to update its beliefs via edgework, which means it could very well drift off from the zone of alignment.

If you want an informed opinion on the matter, ask any parent. Can you install a stable "moral scaffolding" in your children and expect them to remain in alignment with your values as they mature?

Simulated Annealing

Simulated Annealing (SA) is an optimization algorithm[10]. It works in a way that is remarkably similar to rites of passage. It involves a process known as thermalization: you induce chaos and let the system relax and cool down into a stable state. The stable state, as it turns out, tends to be optimal. Why?

We can think of any optimization problem as a landscape with hills and/or valleys. You're trying to find the highest hill (maximization) or the deepest valley (minimization). The SA algorithm frames it as the latter. Imagine that you have a magic marble that has a heat parameter corresponding to its "edginess". You begin with a very edgy marble. It searches the landscape far and wide. Gradually, you decrease the temperature. What happens? Statistically, the marble will tend to end up in the deepest valley.

Hopefully, you will notice the similarity of this process to IDA. If not, we'll have a look at it together.

In SA, temperature is like edginess. And edginess is like amplification. And they're all sort of the thing we refer to as noise, stochasticity, randomness, and chance. When we turn up the heat parameter, we shuffle and shake our model to make it nice and edgy. Then we slowly cool it down. We then freeze it and it's all ready for nodework.

Sometimes SA fails. The model gets stuck in what is referred to as a local optimum which means that it ended up in a valley but it's the wrong one. We want it to end up in the global optimum which is the deepest valley which is objectively speaking the best valley.

This happens to people as well. Anthropologists have described failed rites of passage where participants come out a bit wonky. Different cultures have come up with different strategies to deal with individuals who end up wonky. They could become monks or nuns, for instance. For a brief (and very dark) period in the US they were lobotomized. One psychiatrist drove around in what became known as the lobotomobile and performed approximately 4,000 lobotomies, including on the sister of Joseph F. Kennedy.

With SA, you just give the algorithm another go. If Carhart-Harris and Friston are right about psychedelics, they could conceivably be used in an analogous fashion.

It has been argued before that psychoanalysis and meditation are both slow and potentially painful SA procedures in a very real sense. Our language seems to already contain this insight. We know that people are more malleable once they warm up to you and we know that cool people aren't easily influenced by others. We can sense it when things start to heat up and we know that we tend to come to our senses when we get a chance to cool down. We don't expect sense from a hot-headed person but we do from a cool-headed one. Someone who interrupts a liminal phase (like a party) is a downer and someone who wants to enter one might take an upper.

Our language already implies that there's a temperature parameter of sorts within us that can be tuned according to demand.

Compression Progress Drive

Jürgen Schmidhuber is a machine-learning legend and occasional gadfly. In the early 90s he proposed a model for artificial curiosity based on what he calls compression progress drive (CPD)[11].

CPD relies on adversarial collaboration. You have one agent, which we can Apollo. Apollo extracts patterns. Gists. Compresses information. Distills knowledge. You have another, which we can call Dionysus. Dionysus has a hunger for knowledge and directs Apollo toward the unknown. It discovers potential. It tells Apollo: "Hey maybe you should try to compress this stuff, I think it's going to be really useful!"

Dionysus compels the system toward chaos. Apollo compels the system toward order. Together, they ensure robustness.

Schmidhuber refers to it as a formal theory of fun, which sounds oxymoronic. Dissecting a joke tends to kill it. When Apollo looks at Dionysus' edgy humor, it doesn't seem like a laughing matter. This is also, of course, the sort of stuff Freud and Jung talked about.

Healthy people can't tickle themselves. People with schizophrenia can. People don't normally burst into laughter spontaneously at jokes they tell themselves. People with schizophrenia do. Why?

Until recently, this was usually explained in terms of "efference copies" and "corollary discharge"[12]. These terms both mean that the nervous system is sending out a signal in advance to alert it about what it's going to do next. This is how it can correct for self-generated behavior. When you move your eyeballs about (as one does), it doesn't appear to you as if the world is moving. But what happens if you paralyze the muscles that are supposed to move your eyeballs? The efference copies and corollary discharge still gets sent, so you'd expect that perceptual areas of the brain would still make corrections. And amazingly, the result is that your brain convinces you that the world is moving.

In people with schizophrenia, these signals get lost in the mail. Which means that their own behavior comes as a surprise to themselves. That's why they often become convinced that someone is controlling their thoughts: it's the best explanation their brains can come up with based on available evidence.

Other people can tickle us because they can surprise us in a way we can't. They can make us laugh for the very same reason.

The current view of this is that it's a consequence of predictive processing: our brains are constantly making predictions and trying to eliminate pesky surprises[13]. We need occasional Dionysian interventions in order to update our Bayesian models of the world.

Philosopher Henri Bergson suggested that humor always results from perceiving "something mechanical encrusted on the living". In his view, humor is simply the Dionysian deriving satisfaction from the mistakes of the Apollonian. And he suggested further that humor is used as a sort of lubricant to prevent us from becoming too rigid, fixed, and mechanical.

Hence Compression Progress Drive: we are obsessed with patterns because if we weren't we wouldn't be able to cope with change.

Hemispheric Lateralization as IDA

If you ask your average neuroscientist, there's no point in talking about the differences between the two halves of our brains. They work together in unison and the only reason why they are different is because asymmetry is useful for information processing. There's no fundamental difference between them.

In my opinion, neuroscientists collectively agreed to talk about the hemispheres this way because of an overcorrection. After Gazzaniga and Sperry's work on split-brain patients[14] [15], talk about left-brained people and right-brained people became a fashionable topic in popular science. There was an assumption that in every person either the left or the right hemisphere is dominant and this determines whether you're an artsy-fartsy right-brainer or a logical left-brainer. Neuroscientists were right to be appalled: this cartoon-version of hemispheric lateralization is a dumb stereotype.

However, an overcorrection ensued. Suddenly, the fashionable thing to do was ridicule the cartoon image rather than to develop it. Because while the cartoon image is dumb, it's not really wrong. Hemispheric dominance (the idea that your personality depends on which hemisphere is the "boss") is wrong. The two hemispheres being fundamentally different is not.

The novelty-routine model developed by Elkhonon Goldberg touches on the fundamental difference: the right hemisphere is more concerned with novelty while the left hemisphere is more concerned with routine[16].

We could also say that the right hemisphere is concerned with edgework while the left hemisphere is concerned with nodework. Some observations from research on split-brain patients can be helpful here.

One patient found that her hands, controlled by different hemispheres, tried to carry out different actions simultaneously. The left hand, guided by the edgy right hemisphere, wanted her to wear a sexy summer dress. The right hand, guided by the sensible left hemisphere, wanted her to wear a coat more appropriate for the weather.

One young patient was asked what he wanted to be when he grew up. The left hemisphere could answer simply by talking: a draftsman. The right hemisphere managed to communicate by misspelling 'race-car driver' with Scrabble blocks.

It should be noted, however, that most split-brain patients were able to carry on with day-to-day life with few obvious problems. Friends and family rarely noted any differences in their behavior.

Support for this distinction (routine vs. novelty) can be found in research on brain asymmetries in a range of different animals. See Divided Brains for details[17].

Michael Gazzaniga refers to the left hemisphere as "the interpreter"[18]. It takes the information that we have available and runs it through the program to perform some nodework on it. Given that it can't entertain the notion that it might be wrong, it can go to great lengths to convince itself that it's right even in the face of massive evidence to the contrary. Right-hemisphere lesions sometimes result in patients behaving oddly. Their left hand, normally controlled by the right hemisphere, is paralyzed. So what happens if you ask them to move it? Interesting things. They might reply that they can't because it's not their hand. Or they might say that they of course could, but they don't feel like it at the moment. Without the ability to perform important edgework, they haven't gotten the message that circumstances have changed in a way that matters. And so they haven't updated their program. They're still running the good old one, and it has an answer for everything.

The Dan Harmon Story Circle

Showrunner Dan Harmon has discussed what he calls the story circle[19]:

  1. A character is in a zone of comfort,
  2. But they want something.
  3. They enter an unfamiliar situation,
  4. Adapt to it,
  5. Get what they wanted,
  6. Pay a heavy price for it,
  7. Then return to their familiar situation,
  8. Having changed.

This is, of course, a version of the Hero's Journey made famous in The Hero with a Thousand Faces by Joseph Campbell[20]. All mythology seems to include a story eerily similar to this. The hero starts off as an Apollonian, enters a liminal phase to become a Dionysian, and returns as a different sort of Apollonian.

Geneticist Sewall Wright proposed an evolutionary version of this old tale: genetic drift[21]. You have a population in boring old normal-land. Then the environment changes somehow. A sub-population drifts off and mutates. Then it returns to the main population. With important updates.

As evolutionary theory got compressed and condensed and solidified in the middle of the 20th century, there wasn't much room for genetic drift. An Apollonian consensus on evolution had been reached, and only a crazy Dionysian would argue in favor of noise being important.

I'm well aware that I am committing the sin of articulating what is likely to be considered an out-group view in this community. Which means that you, dear reader, is likely to feel your behavioral immune-system senses tickling. Genetic drift is, of course, mentioned in the same breath as Steven Jay Gould and lending credence to heretical ideas such as punctuated equilibrium marks me as an outsider and a heathen.

I still think most can agree, however, that there are two "flavors" of ideology when it comes to evolution:

  1. Evolution as solitary nodework. Emphasizes solidified adaptations in individuals.
  2. Evolution as gregarious edgework. Emphasizes multi-level selection in groups.

All I am saying here is that evolution is both and that community-specific mainstream ideas such as IDA already contains this notion (albeit in a completely different arena). Exploitation works wonders when the environment is regular (tends to stay the same), but you need exploration when it's irregular (tends to change). This principle of adaptation is seen even in bacterial chemotaxis[22].

Sequences and Other Bibles

The point of the sequences on this site is have its members internalize them. They serve the same purpose as Biblical stories: they are meant to provide moral scaffolding that in-group members are expected to extract. An in-group member is an in-group member because they conform to group conventions and adopt group norms.

Rival groups tend to argue about whose moral scaffolding should be widely adopted. Culture wars are attempts to disintegrate and consume rival groups. This is probably the root of most conflict.

Some groups invent a metaphysical space where one's life is judged according to group norms. The religious constructs of heaven and hell were invented to serve this purpose.

In the rationalist community, heaven and hell have been reinvented in the form of simulation arguments. The purpose is the same: it's a metaphysical space where punishment and reward is delivered in an eternal afterlife based on one's behavior. The function of this idea is to enforce normative behavior by alluding to future consequences. Personally, I think the community should work to move past this ancient strategy.

Harry Potter and the Methods of Rationality is meant to serve as moral scaffolding for in-group members (rationalists). That's the right way to go about it. Storytelling is a tried and true method of indoctrination. It tells lessons and sets examples. We are supposed to read it and extract its teachings, which will allow us to perform community-specific nodework. Indoctrination here isn't meant to be disparaging term. It simply describes a function. Textbooks on linear algebra or postmodernism can also be considered indoctrination. All it means is that it's used to train you to think in specific ways. Cognitive-behavioral therapy? The same thing. CBT is meant to help you update your model of the world so that it doesn't undermine you. It's meant to become part of your repertoire of nodework.

Inhibition-Excitation Balance

The brain must strike a balance between two extremes: inhibition and excitation. Too much inhibition results in coma. Too much excitation results in seizures. Inhibition-excitation (IE) balance is achieved via attempts to satisfy mutually-exclusive constraints.

Hebbian learning (cells that fire together, wire together) must be balanced by synaptic homeostasis to avoid runaway excitation[23].

Spike-timing dependent plasticity (STPD) serves as an overarching Hebbian principle[24]: connections between neurons are shaped according to temporally correlated activity. This is in fact a principle of predictive processing: sculpting a system according to activity patterns is analogous to constructing a statistical model[25].

Synaptic homeostasis can be construed as a process of statistical smoothing (model compression). STPD results in the formation of patterns. By averaging stored activity patterns (via synaptic homeostasis) neural models are made more efficient. The cost of maintaining them, then, is lowered. Also, you avoid seizures.

This process can be understood as navigating what is known in statistics as the bias-variance trade off. Erring on the side of bias means you have gone too cheap: a more complicated (less smooth/rougher) model would have performed better. Erring on the side of variance means you have gone too expensive: a more simple (less rough/smoother) model would have performed better.

An error in the direction of bias will tend to result in false positives aka type I errors (you see patterns in noise). An error in the direction of variance will tend to result in false negatives aka type II errors (you see noise in patterns).

It is important to note that the optimal model complexity is always relative to the problem it is intended to solve. A smoke detector is supposed to err on the side of bias because the cost of occasional false positives (sounding an alarm for no reason) outweighs the cost of a false negative (neglecting to sound an alarm in the case of an actual fire).

STDP can be compared to amplification and synaptic homeostasis to distillation. STDP results from model prediction errors. If errors are seen to be important in the context of higher goals (that is, that they are biologically relevant), they get amplified via neuromodulatory activity and protein synthesis is initiated. This process has been called behavioral tagging[26].

An interesting hypothesis is that sleep is a consequence of the need for synaptic homeostasis[27]. A crude analogy is that you spend the period of wakefulness on "eating" information and sleep on "digesting" information.

Ants use an analogous strategy as they forage for resources. If they discover, say, a sugar cube they will leave behind a chemical trail as they return to the colony. Other ants will follow this trail and reinforce it until the source has been depleted. Because their trails gradually decay in signal strength, the trail back and forth from the source to the colony will become optimized. This smoothing process is analogous to synaptic homeostasis.

More Dual Phase Models

Dual coding theory [28]

Phase I: distributed/incompressible phase

Phase II: redundant/compressible phase

This is a model of neural decision making. In the first phase, a cluster of neurons slowly aggregates information and jointly represent potential decisions. Information in individual neurons is distributed. In the second phase, the cluster quickly reaches a consensus. Activity in individual is redundant.

The neurons comprising the cluster can be thought of first as individuals, then as a functional unit. Imagine a group of people with different opinions discussing a topic. As they come to an agreement, they sort of become copies of one another. If you've met one of them, you've met them all. They have become synchronized in a sense.

Bianconi-Barabasi model [29]

Richer-get-richer phase

Winner-takes-all phase

This is a model of the development of complex networks. It's remarkably similar to the one above even though it's based on the very weird phenomenon of Bose-Einstein condensation. A BE condensate is often described as a super-atom. If you cool a low-density gas of bosons down to close to 0K, there's a critical phase transition where the atoms all of a sudden lose their individuality. And this process, believe it or not, fits very well with data on how complex networks such as the internet evolve over time.

Dual-phase evolution [30]

Local phase

Global phase

This is meant to be a model of adaptation in complex adaptive systems. It consists of a local phase where individuals interact in a small group and a global phase where they interact with individuals from completely different groups.

You can think of it in terms of socializing at home or work versus socializing at a party or a conference.

Plasticity-rigidity cycles [31]

Another model of adaptation in complex adaptive systems. It consists of a plastic state and a rigid state. Protein folding is an interesting example. Rigidity is needed to maintain protein structure, but plasticity is also needed to keep folding towards its native state.

I'm sure you've recognized the pattern by now. Nodework and edgework can be thought of as different phases, both of which are crucial for adaptation. For reasons that will soon become obvious, I want to call describe nodework as solitary and edgework as gregarious. You can imagine that there's a larger mission being carried out. Sub-units do their parts in a solitary fashion, but must occasionally check to see whether they are synchronized with what the rest is doing.

In IDA, we want to make sure that our AI is carrying out the right sort of nodework. But we also have to have a way to updates its priorities through some sort of edgework. I'm not convinced that it would be able to do so competitively if human beings have to take turns doing the edgework on its behalf. A rival AI without this limitation would likely be able to take advantage of this limitation and thus render the whole exercise pointless.

There's also the embarrassing problem of what to do with different social groups training different AIs to be aligned according to different moral scaffolds. That's the problem with in-group bias: we can't help but believe that our in-group is morally superior to out-groups.

I will end with a discussion of swarm formation in desert locusts.

As you are surely aware, desert locusts can morph between two phenotypes: a solitary form and a gregarious form. In their solitary form, they compete for resources just like the rest. They keep to themselves and perform what we might call nodework. However, when there are many locusts in the same location rubbing their feet against one another, a phase transition occurs at a critical point. It has been shown that this transition is wholly dependent on levels of serotonin[32]. Low levels? Solitary. High levels? Gregarious.

The solitary desert locusts enter a liminal phase where they start "radicalizing" each other through a sort of social domino effect. Suddenly, they are no longer individuals. They are, instead, a swarm. The swarm sweeps across the land, laying waste to every field it comes across.

This is just what happens when the circumstances are ripe. When they can't help getting on each other's nerves. When everywhere they go, they keep stepping on each other's toes.

Closing Thoughts

This has been a long post. I thank you for making the time to read it. If you wildly (or even mildly) disagree with me, please let me know. I am prone to changing my mind.

My case against IDA as a path toward AI alignment can pretty accurately be described as poetic rather than logical. Feel free to scoff at the rampant use of allegory, analogy, and metaphorical language. You are also free to roll your eyes at my inability to keep the discussion narrow (seriously? Did I have to bring Dan Harmon into all of this?).

The only original term I've used here is "nodework". I felt I needed a complementary term for "edgework" and that's what I went with. The term has been used in completely different contexts before; I don't mean to refer to any of these.

This post is based on my limited understanding of the ideas of others. Errors are, of course, mine.

References:


  1. Szent-Györgyi, A. (1972) Dionysians and Apollonians. Science 176:966. ↩︎

  2. Carhart-Harris, R. L., & Friston, K. J. (2019). REBUS and the Anarchic Brain: Toward a Unified Model of the Brain Action of Psychedelics. Pharmacological Reviews, 71(3), 316-344. ↩︎

  3. Schaller, M., & Park, J. H. (2011). The Behavioral Immune System (and Why it Matters). Current Directions in Psychological Science, 20(2), 99-103. ↩︎

  4. Eagleton, T. (2011). Literary theory: An introduction. John Wiley & Sons. ↩︎

  5. Slate Star Codex - Staying Classy ↩︎

  6. Astral Codex Ten - Book Review: Fussell On Class ↩︎

  7. Lyng, S. (Ed.). (2004). Edgework: The Sociology of Risk-Taking. Routledge. ↩︎

  8. Zurn, P., Zhou, D., Lydon-Staley, D. M., & Bassett, D. S. (2021). Edgework: Viewing Curiosity as Fundamentally Relational. ↩︎

  9. Slate Star Codex - Maybe your Zoloft stopped working because a liver fluke tried to turn your n-th-great-grandmother into a zombie ↩︎

  10. Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. science, 220(4598), 671-680. ↩︎

  11. Schmidhuber, J. (2008, June). Driven by compression progress: A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. In Workshop on anticipatory behavior in adaptive learning systems (pp. 48-76). Springer, Berlin, Heidelberg. ↩︎

  12. Pynn, L. K., & DeSouza, J. F. (2013). The function of efference copy signals: implications for symptoms of schizophrenia. Vision research, 76, 124-133. ↩︎

  13. Corlett, P. R., Frith, C. D., & Fletcher, P. C. (2009). From drugs to deprivation: a Bayesian framework for understanding models of psychosis. Psychopharmacology, 206(4), 515-530. ↩︎

  14. Gazzaniga, M. S. (2005). Forty-five years of split-brain research and still going strong. Nature Reviews Neuroscience, 6(8), 653-659. ↩︎

  15. Gazzaniga, M. S. (2015). Tales from both sides of the brain: A life in neuroscience. Ecco/HarperCollins Publishers. ↩︎

  16. Goldberg, E. (1994). Cognitive novelty. Neurosciences, 6, 371-378. ↩︎

  17. Rogers, L. J., Vallortigara, G., & Andrew, R. J. (2013). Divided Brains: The Biology and Behaviour of Brain Asymmetries. Cambridge University Press. ↩︎

  18. Roser, M. E., & Gazzaniga, M. S. (2006). The interpreter in human psychology. The Evolution of Primate Nervous Systems. Oxford. ↩︎

  19. Dan Harmon - Story Structure 101 ↩︎

  20. Campbell, J. (1949). The Hero With a Thousand Faces. ↩︎

  21. Masel, J. (2011). Genetic drift. Current Biology, 21(20), R837-R838. ↩︎

  22. Celani, A., & Vergassola, M. (2010). Bacterial strategies for chemotaxis response. Proceedings of the National Academy of Sciences, 107(4), 1391–1396. https://doi.org/10.1073/pnas.0909673107 ↩︎

  23. The Dialectic of Hebb and Homeostasis | Philosophical Transactions of the Royal Society B: Biological Sciences. (2017). Philosophical Transactions of the Royal Society B: Biological Sciences. https://royalsocietypublishing.org/doi/full/10.1098/rstb.2016.0258 ↩︎

  24. Caporale, N., & Dan, Y. (2008). Spike timing–dependent plasticity: a Hebbian learning rule. Annu. Rev. Neurosci., 31, 25-46. ↩︎

  25. Rao, R. P., & Sejnowski, T. J. (2001). Spike-timing-dependent Hebbian plasticity as temporal difference learning. Neural computation, 13(10), 2221-2237. ↩︎

  26. Ballarini, F., Moncada, D., Martinez, M. C., Alen, N., & Viola, H. (2009). Behavioral Tagging is a General Mechanism of Long-Term Memory Formation. Proceedings of the National Academy of Sciences, 106(34), 14599-14604. ↩︎

  27. Tononi, G., & Cirelli, C. (2006). Sleep function and synaptic homeostasis. Sleep medicine reviews, 10(1), 49-62. ↩︎

  28. Daniels, B. C., Flack, J. C., & Krakauer, D. C. (2017). Dual Coding Theory Explains Biphasic Collective Computation in Neural Decision-Making. Frontiers in Neuroscience, 11. https://doi.org/10.3389/fnins.2017.00313‌ ↩︎

  29. Bianconi, G., & Barabási, A.-L. (2001). Bose-Einstein Condensation in Complex Networks. Physical Review Letters, 86(24), 5632–5635. https://doi.org/10.1103/physrevlett.86.5632 ↩︎

  30. Paperin, G., Green, D. G., & Sadedin, S. (2010). Dual-phase evolution in complex adaptive systems. Journal of the Royal Society Interface, 8(58), 609–629. https://doi.org/10.1098/rsif.2010.0719 ↩︎

  31. Csermely, P. (2015). Plasticity-rigidity cycles: A general adaptation mechanism. ArXiv.org. https://arxiv.org/abs/1511.01239 ↩︎

  32. Anstey, M. L., Rogers, S. M., Ott, S. R., Burrows, M., & Simpson, S. J. (2009). Serotonin Mediates Behavioral Gregarization Underlying Swarm Formation in Desert Locusts. Science, 323(5914), 627-630. ↩︎

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 3:15 PM

That's what we need for LW: BibTeX support :P

I'm curious if I'm an outlier here - have you really never tried to relate some joke or funny story and then cracked up before you could finish it? I can't tickle myself, but I can easily make myself laugh.

Anyhow, I think this is to some extent a low-dimensional analogy for a high-dimensional world. When the world is complicated, trying to do something new can result in finding connections to something you already know about, and studying something familiar can uncover the surprising. This is for exactly the same reason that a 1D patch on a string is close to fewer neighbors than a 3D section of space, which in turn has fewer neighbors than a drug molecule has in the space of possible chemicals. In high dimensional problems, both connections and surprises are so common as to be unavoidable. But if we're just walking around on the 2D surface of the earth, we'll probably run into connections and surprises at about the rate we'd expect from our stories.

I do indeed make myself laugh at times. I think it has something to do with depth. The consequence of a line of thinking can be surprising, and that's probably relevant.

That's an interesting way of looking at it. Feynman had a hunch on the topic, which he shared in his Nobel Prize speech: nature is simple in some sense. We can describe things in many different ways without knowing that we're describing the same thing. Which, he said, is a sort of simplicity.