This is the first post in a sequence on Cartesian frames, a new way of modeling agency that has recently shaped my thinking a lot.
Traditional models of agency have some problems, like:
They treat the "agent" and "environment" as primitives with a simple, stable input-output relation. (See "Embedded Agency.")
They assume a particular way of carving up the world into variables, and don't allow for switching between different carvings or different levels of description.
Cartesian frames are a way to add a first-person perspective (with choices, uncertainty, etc.) on top of a third-person "here is the set of all possible worlds," in such a way that many of these problems either disappear or become easier to address.
The idea of Cartesian frames is that we take as our basic building block a binary function which combines a choice from the agent with a choice from the environment to produce a world history.
We don't think of the agent as having inputs and outputs, and we don't assume that the agent is an object persisting over time. Instead, we only think about a set of possible choices of the agent, a set of possible environments, and a function that encodes what happens when we combine these two.
This basic object is called a Cartesian frame. As with dualistic agents, we are given a way to separate out an “agent” from an “environment." But rather than being a basic feature of the world, this is a “frame” — a particular way of conceptually carving up the world.
We will use the combinatorial properties of a given Cartesian frame to derive versions of inputs, outputs and time. One goal here is that by making these notions derived rather than basic, we can make them more amenable to approximation and thus less dependent on exactly how one draws the Cartesian boundary. Cartesian frames also make it much more natural to think about the world at multiple levels of description, and to model agents as having subagents.
Mathematically, Cartesian frames are exactly Chu spaces. I give them a new name because of my specific interpretation about agency, which also highlights different mathematical questions.
Using Chu spaces, we can express many different relationships between Cartesian frames. For example, given two agents, we could talk about their sum (⊕), which can choose from any of the choices available to either agent, or we could talk about their tensor (⊗), which can accomplish anything that the two agents could accomplish together as a team.
Cartesian frames also have duals (−∗) which you can get by swapping the agent with the environment, and ⊕ and ⊗ have De Morgan duals (& and ⅋ respectively), which represent taking a sum or tensor of the environments. The category also has an internal hom, ⊸, where C⊸D can be thought of as "D with a C-shaped hole in it." These operations are very directly analogous to those used in linear logic.
1. Definition
Let W be a set of possible worlds. A Cartesian frame C over W is a triple C=(A,E,⋅), where A represents a set of possible ways the agent can be, E represents a set of possible ways the environment can be, and ⋅:A×E→W is an evaluation function that returns a possible world given an element of A and an element of E.
We will refer to A as the agent, the elements of A as possible agents, E as the environment, the elements of E as possible environments, W as the world, and elements of W as possible worlds.
Definition: A Cartesian frame C over a set W is a triple (A,E,⋅), where A and E are sets and ⋅:A×E→W. If C=(A,E,⋅) is a Cartesian frame over W, we say Agent(C)=A, Env(C)=E, World(C)=W, and Eval(C)=⋅.
A finite Cartesian frame is easily visualized as a matrix, where the rows of the matrix represent possible agents, the columns of the matrix represent possible environments, and the entries of the matrix are possible worlds:
Ee1e2e3Aa1a2a3⎛⎜⎝w1w2w3w4w5w6w7w8w9⎞⎟⎠.
E.g., this matrix tells us that if the agent selects a3 and the environment selects e1, then we will end up in the possible world w7.
Because we're discussing an agent that has the freedom to choose between multiple possibilities, the language in the definition above is a bit overloaded. You can think of A as representing the agent before it chooses, while a particular a∈A represents the agent's state after making a choice.
Note that I'm specifically not referring to the elements of A as "actions" or "outputs"; rather, the elements of A are possible ways the agent can choose to be.
Since we're interpreting Cartesian frames as first-person perspectives tacked onto sets of possible worlds, we'll also often phrase things in ways that identify a Cartesian frame C with its agent. E.g., we will say "C0 is a subagent of C1" as a shorthand for "C0's agent is a subagent of C1's agent."
We can think of the environment E as representing the agent's uncertainty about the set of counterfactuals, or about the game that it's playing, or about "what the world is as a function of my behavior."
A Cartesian frame is effectively a way of factoring the space of possible world histories into an agent and an environment. Many different Cartesian frames can be put on the same set of possible worlds, representing different ways of doing this factoring. Sometimes, a Cartesian frame will look like a subagent of another Cartesian frame. Other times, the Cartesian frames may look more like independent agents playing a game with each other, or like agents in more complicated relationships.
2. Normal-Form Games
When viewed as a matrix, a Cartesian frame looks much like the normal form of a game, but with possible worlds rather than pairs of utilities as entries.
In fact, given a Cartesian frame over W, and a function from W to a set V, we can construct a Cartesian frame over V by composing them in the obvious way. Thus, if we had a Cartesian frame (A,E,⋅) and a pair of utility functions UA:W→R and UE:W→R, we could construct a Cartesian frame over R2, given by (A,E,⋆), where a⋆e:=(UA(a⋅e),UE(a⋅e)). This Cartesian frame will look exactly like the normal form of a game. (Although it is a bit weird to think of the environment set as having a utility function.)
We can use this connection with normal-form games to illustrate three features of the ways in which we will use Cartesian frames.
2.1. Coarse World Models
First, note that we can talk about a Cartesian frame over R2, even though one would not normally think of R2 as a set of possible worlds.
In general, we will often want to talk about Cartesian frames over "coarse" models of the world, models that leave out some details. We might have a world model W that fully specifies the universe at the subatomic level, while also wanting to talk about Cartesian frames over a set V of high-level descriptions of the world.
We will construct Cartesian frames over V by composing Cartesian frames over W with the function from W to V that sends more refined, detailed descriptions of the universe to coarser descriptions of the same universe.
In this way, we can think of an element of (r1,r2)∈R2 as the coarse, high-level possible world given by "Those possible worlds for which UA=r1 and UE=r2."
Definition: Given a Cartesian frame C=(A,E,⋅) over W, and a function f:W→V, let f∘(C) denote the Cartesian frame over V, f∘(C)=(A,E,⋆), where a⋆e=f(a⋅e).
2.2. Symmetry
Second, normal-form games highlight the symmetry between the players.
We do not normally think about this symmetry in agent-environment interactions, but this symmetry will be a key aspect of Cartesian frames. Every Cartesian frame C=(A,E,⋅) has a dual which swaps A and E and transposes the matrix.
2.3. Relation to Extensive-Form Games
Third, much of what we'll be doing with Cartesian frames in this sequence can be summarized as "trying to infer extensive-form games from normal-form games" (ignoring the "games" interpretation and just looking at what this would entail formally).
Consider the ultimatum game. We can represent this game in extensive form:
Given any game in extensive form, we can then convert it to a game in normal form. In this case:
The strategies in the normal-form game are the policies in the extensive-form game.
If we then delete the labels, so now we just have a bunch of combinatorial structure about which things send you to the same place, I want to know when we can infer properties of the original extensive-form game, like time and information states.
Although we've used games to note some features of Cartesian frames, we should be clear that Cartesian frames aren't about utilities or game-theoretic rationality. We are not trying to talk about what the agent does, or what the agent should do. In fact, we are effectively taking as our most fundamental building block that an agent can freely choose from a set of available actions.
The theory of Cartesian frames is trying to understand what agents' options are. Utility functions and facts about what the agent actually does can possibly later be placed on top of the Cartesian frame framework, but for now we will be focusing on building up a calculus of what the agent could do.
3. Controllables
We would like to use Cartesian frames to reconstruct ideas like "an agent persisting over time," inputs (or "what the agent can learn"), and outputs (or "what the agent can do"), by taking as basic:
an agent's ability to "freely choose" between options;
a collection of possible ways those options can correspond to world histories; and
a notion of when world histories are considered the same in some coarse world model.
In this way, we hope to find new ways of thinking about partial and approximate versions of these concepts.
Instead of thinking of the agent as an object with outputs, I expect a more embedded view to think of all the facts about the world that the agent can force to be true or false.
This includes facts of the form "I output foo," but it also includes facts that are downstream from immediate outputs. Since we're working with "what can I make happen?" rather than "what is my output?", the theory becomes less dependent on precisely answering questions like "Is my output the way I move my mouth, or is it the words that I say?"
We will call the analogue of outputs in Cartesian frames controllables. The types of our versions of "outputs" and "inputs" are going to be subsets of W, which we can think of as properties of the world. E.g., S might be the set of worlds in which woolly mammoths exist; we could then think of "controlling S" as "controlling whether or not mammoths exist."
We'll define what an agent can control as follows. First, given a Cartesian frame C=(A,E,⋅) over W, and a subset S of W, we say that S is ensurable in C if there exists an a∈A such that for all e∈E, we have a⋅e∈S. Equivalently, we say that S is ensurable in C if at least one of the rows in the matrix only contains elements of S.
Definition: Ensure(C)={S⊆W|∃a∈A,∀e∈E,a⋅e∈S}.
If an agent can ensure S, then regardless of what the environment does — and even if the agent doesn't know what the environment does, or its behavior isn't a function of what the environment does — the agent has some strategy which makes sure that the world ends up in S. (In the degenerate case where the agent is empty, the set of ensurables is empty.)
Similarly, we say that S is preventable in C if at least one of the rows in the matrix contains no elements of S.
Definition: Prevent(C)={S⊆W|∃a∈A,∀e∈E,a⋅e∉S}.
If S is both ensurable and preventable in C, we say that S is controllable in C.
Definition: Ctrl(C)=Ensure(C)∩Prevent(C).
3.1. Closure Properties
Ensurability and preventability, and therefore also controllability, are closed under adding possible agents to A and removing possible environments from E.
Claim: If A′⊇A and E′⊆E, and if for all a∈A and e∈E′ we have a⋆e=a⋅e, then Ctrl(A′,E′,⋆)⊇Ctrl(A,E,⋅).
Proof: Trivial. □
Ensurables are also trivially closed under supersets. If I can ensure some set of worlds, then I can ensure some larger set of worlds representing a weaker property (like "mammoths exist or cave bears exist").
Claim: If S1⊆S2⊆W, and S1∈Ensure(C), then S2∈Ensure(C).
This is the first post in a sequence on Cartesian frames, a new way of modeling agency that has recently shaped my thinking a lot.
Traditional models of agency have some problems, like:
Cartesian frames are a way to add a first-person perspective (with choices, uncertainty, etc.) on top of a third-person "here is the set of all possible worlds," in such a way that many of these problems either disappear or become easier to address.
The idea of Cartesian frames is that we take as our basic building block a binary function which combines a choice from the agent with a choice from the environment to produce a world history.
We don't think of the agent as having inputs and outputs, and we don't assume that the agent is an object persisting over time. Instead, we only think about a set of possible choices of the agent, a set of possible environments, and a function that encodes what happens when we combine these two.
This basic object is called a Cartesian frame. As with dualistic agents, we are given a way to separate out an “agent” from an “environment." But rather than being a basic feature of the world, this is a “frame” — a particular way of conceptually carving up the world.
We will use the combinatorial properties of a given Cartesian frame to derive versions of inputs, outputs and time. One goal here is that by making these notions derived rather than basic, we can make them more amenable to approximation and thus less dependent on exactly how one draws the Cartesian boundary. Cartesian frames also make it much more natural to think about the world at multiple levels of description, and to model agents as having subagents.
Mathematically, Cartesian frames are exactly Chu spaces. I give them a new name because of my specific interpretation about agency, which also highlights different mathematical questions.
Using Chu spaces, we can express many different relationships between Cartesian frames. For example, given two agents, we could talk about their sum (⊕), which can choose from any of the choices available to either agent, or we could talk about their tensor (⊗), which can accomplish anything that the two agents could accomplish together as a team.
Cartesian frames also have duals (−∗) which you can get by swapping the agent with the environment, and ⊕ and ⊗ have De Morgan duals (& and ⅋ respectively), which represent taking a sum or tensor of the environments. The category also has an internal hom, ⊸, where C⊸D can be thought of as "D with a C-shaped hole in it." These operations are very directly analogous to those used in linear logic.
1. Definition
Let W be a set of possible worlds. A Cartesian frame C over W is a triple C=(A,E,⋅), where A represents a set of possible ways the agent can be, E represents a set of possible ways the environment can be, and ⋅:A×E→W is an evaluation function that returns a possible world given an element of A and an element of E.
We will refer to A as the agent, the elements of A as possible agents, E as the environment, the elements of E as possible environments, W as the world, and elements of W as possible worlds.
Definition: A Cartesian frame C over a set W is a triple (A,E,⋅), where A and E are sets and ⋅:A×E→W. If C=(A,E,⋅) is a Cartesian frame over W, we say Agent(C)=A, Env(C)=E, World(C)=W, and Eval(C)=⋅.
A finite Cartesian frame is easily visualized as a matrix, where the rows of the matrix represent possible agents, the columns of the matrix represent possible environments, and the entries of the matrix are possible worlds:
Ee1e2e3Aa1a2a3⎛⎜⎝w1w2w3w4w5w6w7w8w9⎞⎟⎠.
E.g., this matrix tells us that if the agent selects a3 and the environment selects e1, then we will end up in the possible world w7.
Because we're discussing an agent that has the freedom to choose between multiple possibilities, the language in the definition above is a bit overloaded. You can think of A as representing the agent before it chooses, while a particular a∈A represents the agent's state after making a choice.
Note that I'm specifically not referring to the elements of A as "actions" or "outputs"; rather, the elements of A are possible ways the agent can choose to be.
Since we're interpreting Cartesian frames as first-person perspectives tacked onto sets of possible worlds, we'll also often phrase things in ways that identify a Cartesian frame C with its agent. E.g., we will say "C0 is a subagent of C1" as a shorthand for "C0's agent is a subagent of C1's agent."
We can think of the environment E as representing the agent's uncertainty about the set of counterfactuals, or about the game that it's playing, or about "what the world is as a function of my behavior."
A Cartesian frame is effectively a way of factoring the space of possible world histories into an agent and an environment. Many different Cartesian frames can be put on the same set of possible worlds, representing different ways of doing this factoring. Sometimes, a Cartesian frame will look like a subagent of another Cartesian frame. Other times, the Cartesian frames may look more like independent agents playing a game with each other, or like agents in more complicated relationships.
2. Normal-Form Games
When viewed as a matrix, a Cartesian frame looks much like the normal form of a game, but with possible worlds rather than pairs of utilities as entries.
In fact, given a Cartesian frame over W, and a function from W to a set V, we can construct a Cartesian frame over V by composing them in the obvious way. Thus, if we had a Cartesian frame (A,E,⋅) and a pair of utility functions UA:W→R and UE:W→R, we could construct a Cartesian frame over R2, given by (A,E,⋆), where a⋆e:=(UA(a⋅e),UE(a⋅e)). This Cartesian frame will look exactly like the normal form of a game. (Although it is a bit weird to think of the environment set as having a utility function.)
We can use this connection with normal-form games to illustrate three features of the ways in which we will use Cartesian frames.
2.1. Coarse World Models
First, note that we can talk about a Cartesian frame over R2, even though one would not normally think of R2 as a set of possible worlds.
In general, we will often want to talk about Cartesian frames over "coarse" models of the world, models that leave out some details. We might have a world model W that fully specifies the universe at the subatomic level, while also wanting to talk about Cartesian frames over a set V of high-level descriptions of the world.
We will construct Cartesian frames over V by composing Cartesian frames over W with the function from W to V that sends more refined, detailed descriptions of the universe to coarser descriptions of the same universe.
In this way, we can think of an element of (r1,r2)∈R2 as the coarse, high-level possible world given by "Those possible worlds for which UA=r1 and UE=r2."
Definition: Given a Cartesian frame C=(A,E,⋅) over W, and a function f:W→V, let f∘(C) denote the Cartesian frame over V, f∘(C)=(A,E,⋆), where a⋆e=f(a⋅e).
2.2. Symmetry
Second, normal-form games highlight the symmetry between the players.
We do not normally think about this symmetry in agent-environment interactions, but this symmetry will be a key aspect of Cartesian frames. Every Cartesian frame C=(A,E,⋅) has a dual which swaps A and E and transposes the matrix.
2.3. Relation to Extensive-Form Games
Third, much of what we'll be doing with Cartesian frames in this sequence can be summarized as "trying to infer extensive-form games from normal-form games" (ignoring the "games" interpretation and just looking at what this would entail formally).
Consider the ultimatum game. We can represent this game in extensive form:
Given any game in extensive form, we can then convert it to a game in normal form. In this case:
Offer 6Offer 3Accept 6,Accept 36,69,3Accept 6,Reject 36,60,0Reject 6,Accept 30,09,3Reject 6,Reject 30,00,0
The strategies in the normal-form game are the policies in the extensive-form game.
If we then delete the labels, so now we just have a bunch of combinatorial structure about which things send you to the same place, I want to know when we can infer properties of the original extensive-form game, like time and information states.
Although we've used games to note some features of Cartesian frames, we should be clear that Cartesian frames aren't about utilities or game-theoretic rationality. We are not trying to talk about what the agent does, or what the agent should do. In fact, we are effectively taking as our most fundamental building block that an agent can freely choose from a set of available actions.
The theory of Cartesian frames is trying to understand what agents' options are. Utility functions and facts about what the agent actually does can possibly later be placed on top of the Cartesian frame framework, but for now we will be focusing on building up a calculus of what the agent could do.
3. Controllables
We would like to use Cartesian frames to reconstruct ideas like "an agent persisting over time," inputs (or "what the agent can learn"), and outputs (or "what the agent can do"), by taking as basic:
In this way, we hope to find new ways of thinking about partial and approximate versions of these concepts.
Instead of thinking of the agent as an object with outputs, I expect a more embedded view to think of all the facts about the world that the agent can force to be true or false.
This includes facts of the form "I output foo," but it also includes facts that are downstream from immediate outputs. Since we're working with "what can I make happen?" rather than "what is my output?", the theory becomes less dependent on precisely answering questions like "Is my output the way I move my mouth, or is it the words that I say?"
We will call the analogue of outputs in Cartesian frames controllables. The types of our versions of "outputs" and "inputs" are going to be subsets of W, which we can think of as properties of the world. E.g., S might be the set of worlds in which woolly mammoths exist; we could then think of "controlling S" as "controlling whether or not mammoths exist."
We'll define what an agent can control as follows. First, given a Cartesian frame C=(A,E,⋅) over W, and a subset S of W, we say that S is ensurable in C if there exists an a∈A such that for all e∈E, we have a⋅e∈S. Equivalently, we say that S is ensurable in C if at least one of the rows in the matrix only contains elements of S.
Definition: Ensure(C)={S⊆W | ∃a∈A, ∀e∈E, a⋅e∈S}.
If an agent can ensure S, then regardless of what the environment does — and even if the agent doesn't know what the environment does, or its behavior isn't a function of what the environment does — the agent has some strategy which makes sure that the world ends up in S. (In the degenerate case where the agent is empty, the set of ensurables is empty.)
Similarly, we say that S is preventable in C if at least one of the rows in the matrix contains no elements of S.
Definition: Prevent(C)={S⊆W | ∃a∈A, ∀e∈E, a⋅e∉S}.
If S is both ensurable and preventable in C, we say that S is controllable in C.
Definition: Ctrl(C)=Ensure(C)∩Prevent(C).
3.1. Closure Properties
Ensurability and preventability, and therefore also controllability, are closed under adding possible agents to A and removing possible environments from E.
Claim: If A′⊇A and E′⊆E, and if for all a∈A and e∈E′ we have a⋆e=a⋅e, then Ctrl(A′,E′,⋆)⊇Ctrl(A,E,⋅).
Proof: Trivial. □
Ensurables are also trivially closed under supersets. If I can ensure some set of worlds, then I can ensure some larger set of worlds representing a weaker property (like "mammoths exist or cave bears exist").
Claim: If S1⊆S2⊆W, and S1∈Ensure(C), then S2∈Ensure(C).
Proof: Trivial. □
Prevent