(Update September 2024: My thinking has developed a bit since 2020. See [Intuitive self-models] 2. Conscious Awareness for my own Rethinking-Consciousness-adjacent theorizing.)

Princeton neuroscientist Michael Graziano wrote the book Rethinking Consciousness (2019) to explain his "Attention Schema" theory of consciousness (endorsed by Dan Dennett!^[1]). If you don't want to read the whole book, you can get the short version in this 2015 article.

I'm particularly interested in this topic because, if we build AGIs, we ought to figure out whether they are conscious, and/or whether that question matters morally. (As if we didn't already have our hands full thinking about the human impacts of AGI!) This book is nice and concrete and computational, and I think it at least offers a start to answering the first part of that question.

What is attention schema theory?

There are two ingredients.

For the first ingredient, you should read Kaj Sotala's excellent review of Consciousness and the Brain by Stan Dehaene (or read the actual book!) To summarize, there is a process in the brain whereby certain information gets promoted up to a "Global Neuronal Workspace" (GNW), a special richly-connected high-level subnetwork of the brain. Only information in the GNW can be remembered and described—i.e., this is the information of which we are "aware". For example, if something flashes in our field of view too quickly for us to "notice", it doesn't enter the GNW. So it does get processed to some extent, and can cause local brain activity that persists for a couple seconds, but will not cascade to a large, widespread signal with long-lasting effects.

Every second of every day, information is getting promoted to the GNW, and the GNW is processing it and pushing information into other parts of the brain. This process does not constitute all of cognition, but it's an important part. Graziano calls this process attention.^[2]

The second ingredient is that the brain likes to build predictive models of things—Graziano calls them "schemas" or "internal models". If you know what an apple is, your brain has an "apple model", that describes apples' properties, behavior, affordances, etc. Likewise, we all have a "body schema", a deeply-rooted model that tracks where our body is, what it's doing, and how it works. If you have a phantom limb, that means your body schema has a limb where your actual body does not. As the phantom limb example illustrates, these schemas are deeply rooted, and not particularly subject to deliberate control.

Now put the two together, and you get an "attention schema", an internal model of attention (i.e., of the activity of the GNW). The attention schema is supposedly key to the mystery of consciousness.

Why does the brain build an attention schema? Graziano offers two reasons, and I'll add a third.

First, it's important that we control attention (it being central to cognition), and control theory says it's impossible to properly control something unless you're modeling it. Graziano offers an example of trying to ignore a distraction. Experiments show that, other things equal, this is easier if we are aware of the distraction. That's counter-intuitive, and supports his claim.
Second, the attention schema can also be used to model other people's attention, which is helpful for interacting with them, understanding them, deceiving them, etc.
Third (I would add), the brain is a thing that by default builds internal models of everything it encounters. The workings of the GNW obviously has a giant impact on the signals going everywhere in the brain, so of course the brain is going to try to build a predictive model of it! I mention this partly because of my blank-slate-ish sympathies, but I think it's an important possibility to keep in mind, because it would mean that even if we desperately want to build a human-cognition-like AGI without an attention schema (if we want AGIs to be unconscious for ethical reasons; more on which below), it might be essentially impossible.

To be clear, if GNW is "consciousness" (as Dehaene describes it), then the attention schema is "how we think about consciousness". So this seems to be at the wrong level! This is a book about consciousness; shouldn't we be talking directly about the nature of consciousness itself?? I was confused about this for a while. But it turns out, he wants to be one level up! He thinks that's where the answers are, in the "the meta-problem of consciousness". See below.

When people talk about consciousness, they're introspecting about their attention schema

Let's go through some examples.

Naive description: I have a consciousness, and I can be aware of things, like right now I'm aware of this apple.

...and corresponding sophisticated description: One of my internal models is an attention schema. According to that schema, attention has a particular behavior wherein attention kinda "takes possession" of a different internal model, e.g. a model of a particular apple. Objectively, we would say that this happens when the apple model becomes active in the GNW.

Naive description: My consciousness is not a physical thing with color, shape, texture. So it's sorta metaphysical, although I guess it's roughly located in my head.

...and corresponding sophisticated description: Just as my internal model of "multiplication" has no property of "saltiness", by the same token, my attention schema describes attention as having no color, shape, or texture.

Naive description: I have special access to my own consciousness. I alone can truly experience my experiences.

...and corresponding sophisticated description: The real GNW does not directly interact with other people; it only interacts with the world by affecting my own actions. Reflecting that fact, my attention schema describes attention as a thing to which I have privileged access.

Naive description An intimate part of my consciousness is its tie to long-term memory. If you show me a video of me going scuba diving this morning, and I absolutely have no memory whatsoever of it, and you can prove that the video is real, well I mean, I don't know what to say, I must have been unconscious or something!

...and corresponding sophisticated description: Essentially everything that enters the GNW leaves at least a slight trace in long-term memory. Thus, one aspect of my attention schema is that it describes attention and memory as inextricably linked. According to my internal models, when attention "takes possession" of some piece of information, it leaves a trace in long-term memory, and conversely, nothing can get into long-term memory unless attention first takes possession of it.

Naive description: Hey, hey, what are you going on about "internal models" and "attention schema"? I don't know anything about that. I know what my consciousness is, I can feel it. It's not a model, it's not a computation, it's not a physical thing. (And don't call me naive!)

...and corresponding sophisticated description: All my internal models are simplified entities, containing their essential behavior and properties, but not usually capturing the nuts-and-bolts of how they work in the real world. (In a programming analogy, you could say that we're modeling the GNW's API & documentation, not its implementation.) Thus, my attention schema does not involve neurons or synapses or GNWs or anything like that, even if, in reality, that's what it's modeling.

The meta-problem of consciousness

The "hard problem of consciousness" is "why is there an experience of consciousness; why does information processing feel like anything at all?"

The "meta-problem of consciousness" is "why do people believe that there's a hard problem of consciousness?"

The meta-problem has the advantage of having obvious and non-confusing methods of attack: the belief that there's a hard problem of consciousness is an observable output of the brain, and can be studied by normal cognitive neuroscience.

But the real head-scratcher is: If we have a complete explanation of the meta-problem, is there anything left to explain regarding the hard problem? Graziano's answer seems to be a resounding "No!", and we end up with conversations like these:

Normal Person: What about qualia?

Person Who Has Solved The Meta-Problem Of Consciousness: Let me explain why the brain, as an information processing system, would ask the question "What about qualia"...

NP: What about subjective experience?

PWHSTMPOC: Let me explain why the brain, as an information processing system, would ask the question "What about subjective experience"...

NP: You're not answering my questions!

PWHSTMPOC: Let me explain why the brain, as an information processing system, would say "You're not answering my questions"...

...

The book goes through this type of discussion several times. I feel a bit torn. One side of me says: obviously Graziano's answers are correct, and obviously no other answer is possible. The other side of me says: No no no, he did not actually answer these questions!!

On reflection, I have to side with "Obviously Graziano's are correct, and no other answer is possible." But I still find it annoying and deeply unsatisfying.

(Update: A commenter points me to Luke Muehlhauser's report on consciousness Appendix F for ideas and further reading. Having read a bit more, I still find this line of thought counterintuitive, but less so.) (Update 2: Ditto Joe Carlsmith's blog.)

Illusionism

Graziano says that his theory is within the philosophical school of thought called "Illusionism". But he thinks that term is misleading. He says it's not "illusion as in mirage", but "illusion as in mental construction", like how everything we see is an "illusion" rather than raw perceptual data.

Edited to add: Graziano makes illusionism sound very straightforward and unobjectionable. Maybe he has a way to think about it such that it really is straightforward and unobjectionable. ...Or maybe he's dancing around the counterintuitive or controversial aspects of his theory, to make it more palatable to a broad audience. I'm inclined to think it's the latter. There's another example of this elsewhere in the book: his discussion of Integrated Information Theory. He could have just said "IIT is baloney" and I would have been totally on board. I think IIT is fundamentally wrong; it was an interesting idea to look into, but let's now put it in the garbage and move on. And that's exactly what Graziano's theory implies. But Graziano doesn't say that. Instead, as I recall, he has a scrupulously non-confrontational discussion of how the GNW stuff he talks about involves a lot of integration of information in a way that the IIT "Φ" calculation would endorse as conscious. So, I think he wants to pick his battles, and that's why he dances around how weird and unintuitive illusionism really is. I could be wrong.

Emulations

He has a fun chapter on brain uploading, which is not particularly related to the rest of the book. He discusses some fascinating neuroscience aspects of brain-scanning, like the mystery of whether glial cells do computations, but spends most of the time speculating about the bizarre implications for society.

Implications for AGI safety

He suggests that, since humans are generally pro-social, and part of that comes from modeling each other using attention schemas, perhaps the cause of AGI Safety could be advanced by deliberately building conscious AGIs with attention schemas (and, I presume, other human-like emotions). Now, he's not a particular expert on AGI Safety, but I think this is not an unreasonable idea; in fact it's one that I'm very interested in myself. (We don't have to blindly copy human emotions ... we can turn off jealousy etc.)

Implications for morality

One issue where Graziano is largely silent is the implications for moral philosophy.

For example, someday we'll have to decide: When we build AGIs, should we assign them moral weight? Is it OK to turn them off? Are our AGIs suffering? How would we know? Should we care? If humans go extinct but conscious AGIs have rich experiences as they colonize the universe, do we think of them as our children/successors? Or as our hated conquerers in a now-empty clockwork universe?

I definitely share the common intuition is that we should care about the suffering of things that are conscious (and/or sentient, I'm not sure what the difference is). However, in attention schema theory, there does not seem to be a sharp dividing line between "things with an attention schema" and "things without an attention schema", especially in the wide space of all possible computations. There are (presumably) computations that arguably involve something like an "attention schema" but with radically alien properties. There doesn't seem to be any good reason that, out of all the possible computational processes in the universe, we should care only and exactly about computations involving an attention schema. Instead, the picture I get is more like we're taking an ad-hoc abstract internal model and thoughtlessly reifying it. It's like if somebody worshipped the concept of pure whiteness, and went searching the universe for things that match that template, only to discover that white is a mixture of colors, and thus pure whiteness—when taken to be a literal description of a real-world phenomenon—simply doesn't exist. What then?

It's a mess.

So, as usual when I start thinking too hard about philosophy, I wind up back at Dentin's Prayer of the Altruistic Nihilist:

Why do I exist? Because the universe happens to be set up this way. Why do I care (about anything or everything)? Simply because my genetics, atoms, molecules, and processing architecture are set up in a way that happens to care.

So, where does that leave us? Well, I definitely care about people. If I met an AGI that was pretty much exactly like a nice person, inside and out, I would care about it too (for direct emotional reasons), and I would feel that caring about it is the right thing to do (for intellectual consistency reasons). For AGIs running more alien types of algorithms—man, I just have no idea.

(thanks Tan Zhi Xuan for comments on a draft.)

More specifically, I went to a seminar where Graziano explained his theory, and then Dan Dennett spoke and said that he had essentially nothing to disagree with concerning what Graziano had said. I consider that more-or-less an "endorsement", but I may be putting words in his mouth. ↩︎
I found his discussion of "attention" vs "awareness" confusing. I'm rounding to the nearest theory that makes sense to me, which might or might not be exactly what he was trying to describe. ↩︎