Vanessa Kosoy

Director of AI research at ALTER, where I lead a group working on the learning-theoretic agenda for AI alignment. PhD student in Shay Moran's group in the Technion (my PhD research and my ALTER research are one and the same). See also Google Scholar and LinkedIn.

E-mail: {first name}@alter.org.il

Wikitag Contributions

Comments

Sorted by

Sorry, I was wrong. By Lob's theorem, all versions of  are provably equivalent, so they will trust each other.

IIUC, fixed point equations like that typically have infinitely many solution. So, you defined not one  predicate, but an infinite family of them. Therefore, your agent will trust a copy of itself, but usually won't trust variants of itself with other choices of fixed point. In this sense, this proposal is similar to proposals based on quining (as quining has many fixed points as well).

[This comment is no longer endorsed by its author]Reply

TLDR: A new proposal for a prescriptive theory of multi-agent rationality, based on generalizing superrationality using local symmetries.

Hofstadter's "superrationality" postulates that a rational agent should behave as if other rational agents are similar to itself. In a symmetric game, this implies the selection of a Pareto efficient outcome. However, it's not clear how to apply this principle to asymmetric games. I propose to solve this by suggesting a notion of "local" symmetry that is much more ubiquitous than global (ordinary) symmetry.

We will focus here entirely on finite deterministic (pure strategy) games. Generalizing this to mixed strategies and infinite games more generally is an important question left for the future.

Quotient Games

Consider a game  with a set  of players, for each player  a non-empty set  or pure strategies and a utility function , where . What does it mean for the game to be "symmetric"? We propose a "soft" notion of symmetry that is only sensitive to the ordinal ordering between payoffs. While the cardinal value of the payoff will be important in the stochastic context, we plan to treat the latter by explicitly expanding the strategy space.

Given two games , a premorphism from  to  is defined to consist of

  • A mapping 
  • For each , a mapping 

Notice that these mappings induce a mapping  via

A premorphism is said to be a homomorpism if  and  are s.t. that for any  and , if  then . Homomorphisms makes games into a category in a natural way.

An automorphism of  is a homomorphism from  to  that has an inverse homomorphism. An automorphism  of  is said to be flat when for any , if  then  is the identity mapping[1]. A flat symmetry of  is a group  together with a group homomorphism  s.t.  is flat for all .

Given a flat symmetry , we can apply the superrationality principle to reduce  to the "smaller" quotient game . The players are  i.e. orbits in the original player set. Given , we define the set of strategies for player  in the game  to be

This is non-empty thanks to flatness.

Observe that there is a natural mapping  given by

Finally, the utility function of  is defined by

It is easy to see that the construction of  is functorial w.r.t. the category structure on games that we defined.

We can generalize this further by replacing game homomorphisms with game quasimorphisms: premorphisms satisfying the following relaxed condition on  and :

  • For any  and , if  then .

This is no longer closed under composition, so no longer defines a category structure[2]. However, we can still define flat quasisymmetries and the associated quotient games, and this construction is still functorial w.r.t. the original (not "quasi") category structure. Moreover, there is a canonical homomorphism (not just a quasimorphism) from  to , even when  is a mere quasisymmetry.

The significance of quasisymmetries can be understood as follows.

The set of all games on a fixed set of players  with fixed strategy sets  naturally forms a topological space[3] . Given a group  acting on the game via invertible premorphisms, the subset of  where  is a symmetry is not closed, in general. However, the subset of  where  is a quasisymmetry is closed. I believe this will be important when generalizing the formalism to infinite games.

Coarse Graining

What if a game doesn't have even quasisymmetries? Then, we can look for a coarse graining of the game which does have them.

Consider a game . For each , let  be some set and  a surjection. Denote . Given any , we have the game  in which:

  • The set of players is .
  • For any , their set of strategies is . In particular, it implies that the set of outcomes  satisfies .
  • For any , their utility function is .

If for any  we choose some  this allows us to define the coarse-grained game  in which:

  • The set of players is .
  • For any , the set of strategies is .
  • For any  and , the utility function is .

Essentially, we rephrased  as an extensive form game in which first the game  is played and then, if the outcome was , the game  is played. This, assuming the expected outcome of game  is .

It is also possible to generalize this construction by allowing multivalued .

Local Symmetry Solutions

Given a game , a local symmetry presolution (LSP) is recursively defined to be one of the following:

  • Assume there is  s.t. for any . Then, any  is an LSP of . [Effectively single player game]
  • Let  be a coarse-graining of  and for any , let  be an LSP of . Let  be an LSP of . Define  by . Then,  is an LSP of .
  • Let  be a quasisymmetry of  and let  be an LSP of . Then,  is an LSP of .

It's easy to see that an LSP always exists, because for any game we can choose a sequence of coarse-grainings whose effect is making the players choose their strategies one by one (with full knowledge of choices by previous players). However, not all LSP are "born equal". It seems appealing to prefer LSP which use "more" symmetry. This can be formalized as follows.

The way an LSP is constructed defines the set of coherent outcomes , which is the set of outcomes compatible with the local symmetries[4]. We define it recursively as follows:

  • For the LSP of an effectively single-player game, we set .
  • For a coarse-graining LSP, for any let  be the set of coherent outcomes of  and let  be the set of coherent outcomes of . We then define . Here,  is defined by .
  • For a quasisymmetry LSP, let  be the set of coherent outcomes of . We then define .

We define a local symmetry solution (LSS) to be an LSP whose set of coherent outcomes is minimal w.r.t. set inclusion.

Examples

  • In the Prisoner's Dilemma, the only LSS is Cooperate-Cooperate.
  • In Stag Hunt, the only LSS is Stag-Stag.
  • In Chicken, the only LSS is Swerve-Swerve. If we add a finite number of randomized strategies, it opens more possibilities.
  • In Battle of the Sexes, assuming a small payoff from going to your preferred place alone, the only LSS is: each player goes to their own preferred place. If we give each player a "coinflip" strategy, then I think the only LSS is both players using the coinflip.
  • Consider a pure cooperation game where both players have strategies  and the payoff is 1 if they choose the same strategy and 0 if they choose different strategies. Then, any outcome is an LSS. I think that, if we add a "fair coinflip" strategy, any LSS has payoff at least . I believe the LSS payoff will approach optimum if we (i) allow randomization (ii) make the game repeated with time long horizon and (iii) add some constraints on the symmetries, but I'll leave this for a future post.
  • Consider a pure cooperation game where both players have strategies , the payoff is 2 if both choose , 1 if both choose  and 0 if they choose different strategies. Then, the only LSS is .
  • In any game, an LSS guarantees each player at least their deterministic-maximin payoff. In particular, in a two-player zero-sum game in which a pure Nash equilibrium exists, any LSS is a Nash equilibrium.
  1. ^

    Note that flatness is in general not preserved under composition.

  2. ^

    I believe this can be interpreted as a category enriched over the category of sets and partial functions, with the monoidal structure given by Cartesian product of sets.

  3. ^

    An affine space, even.

  4. ^

    Maybe we can interpret the set of coherent outcomes as a sharp infradistirbution corresponding to the players' joint belief about the outcome of the game.

Master post for multi-agency. See also.

In (non-monotonic) infra-Bayesian physicalism, there is a vaguely similar asymmetry even though it's formalized via a loss function. Roughly speaking, the loss function expresses preferences over "which computations are running". This means that you can have a "positive" preference for a particular computation to run or a "negative" preference for a particular computation not to run[1].

  1. ^

    There are also more complicated possibilities, such as "if P runs then I want Q to run but if P doesn't run then I rather that Q also doesn't run" or even preferences that are only expressible in terms of entanglement between computations.

I think that there are two different questions which might be getting mixed up here:

Question 1: Can we fully classify all rules for which sets of Bayes nets imply other Bayes nets over the same variables? Naturally, this is not a fully rigorous question, since "classify" is not a well-defined notion. One possible operationalization of this question is: What is the computational complexity of determining whether a given Bayes net follows from a set of other Bayes nets? For example, if there is a set of basic moves that generate all such inferences then the problem is probably in NP (at least if the number of required moved can also be bounded).

Question 2: What if we replace "Bayes nets" by something like "string diagrams in a Markov category"? Then there might be less rules (because maybe some rules hold for Bayes nets but not in the more abstract setting).

I think that in 2 years we're unlikely to accomplish anything that leaves a dent in P(DOOM), with any method, but I also think it's more likely than not that we actually have >15 years.

As to "completing" the theory of agents, I used the phrase (perhaps perversely) in the same sense that e.g. we "completed" the theory of information: the latter exists and can actually be used for its intended applications (communication systems). Or at least in the sense we "completed" the theory of computational complexity: even though a lot of key conjectures are still unproven, we do have a rigorous understanding of what computational complexity is and know how to determine it for many (even if far from all) problems of interest.

I probably should have said "create" rather than "complete".

(Summoned by @Alexander Gietelink Oldenziel)

I don't understand this comment. I usually don't think of "building a safer LLM agent" as a viable route to aligned AI. My current best guess about how to create aligned AI is Physicalist Superimitation. We can imagine other approaches, e.g. Quantilized Debate, but I am less optimistic there. More importantly, I believe that we need to complete the theory of agents first, before we can have strong confidence about which approaches are more promising.

As to heuristic implementations of infra-Bayesianism, this is something I don't want to speculate about in public, it seems exfohazardous.

Somehow being able to derive all relevant string diagram rewriting rules for latential string diagrams, starting with some fixed set of equivalences?

 

What are "latential" string diagrams?

What does it it mean that you can't derive them all from a "fixed" set? Do you imagine some strong claim e.g. that the set of rewriting rules being undecidable, or something else?

Two Bayes nets are of the same Markov equivalence class when they have precisely the same set of conditionality relations holding on them (and by extension, precisely the same undirected skeleton).

Okay, so this is not what you care about? Maybe you are saying the following: Given two diagrams X,Y, we want to ask whether any distribution compatible with X is compatible with Y. We don't ask whether the converse also holds. This is a certain asymmetric relation, rather than an equivalence.

I found the above comment difficult to parse.

that's not a thing that really happens?

What is the thing that doesn't happen? Reading the rest of the paragraph only left me more confused.

we don't quite care about Markov equivalence class

What do you mean by "Markov equivalence class"?

Load More